perm filename V2H.IN[TEX,DEK] blob
sn#359279 filedate 1977-07-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00019 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 folio 344 galley 1
C00022 00003 folio 347 galley 2
C00040 00004 folio 350 galley 3 WARNING: Much of this tape unreadable!
C00063 00005 folio 354 galley 4
C00079 00006 folio 357 galley 5
C00098 00007 folio 360 galley 6
C00121 00008 folio 363 galley 7
C00138 00009 folio 366 galley 8
C00158 00010 folio 370 galley 9
C00177 00011 folio 372 galley 10
C00194 00012 folio 376 galley 11 WARNING: Some bad spots on this tape.
C00208 00013 folio 379 galley 12
C00224 00014 folio 382 galley 13
C00243 00015 folio 385 galley 14
C00255 00016 folio 388 galley 15
C00270 00017 folio 392 galley 16
C00291 00018 folio 394 galley 17
C00304 00019 folio 395 galley 18
C00314 ENDMK
C⊗;
folio 344 galley 1
0 {U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/Addison-W
1 esley)!f.344!ch.4.!g.1b.|'{A20}{H10L12M29}|π!|9|4|1|1|1It
3 is a straightforward matter to apply the classical
11 algorithms for integers to problems involving
17 numbers with embedded radix points, or rational
24 numbers, or extended-precision ⊗oating-point
28 numbers, in the same way as the arithmetic operations
37 de_ned for integers in |¬m|¬i|¬x are applied
44 to these more general problems.|'!|9|4|1|1|1In
50 this section we shall study algorithms which
57 do operations (a), (b), and (c) above for integers
66 expressed in radix |εb |πnotation, where |εb
73 |πis any given integer|4|¬R|42. Thus the algorithms
80 are quite general de_nitions of arithmetic processes,
87 and as such they are unrelated to any particular
96 computer. But the discussion in this section
103 will also be somewhat machine-oriented, since
109 we are chie⊗y concerned with e∃cient methods
116 for doing high-precision calculations by computer.
122 Although our examples are based on the mythical
130 |¬m|¬i|¬x computer, essentially the same considerations
136 apply to nearly every other machine. For convenience,
144 let us assume _rst that we have a computer (like
154 |¬m|¬i|¬x) which uses the signed-magnitude representation
160 for numbers; suitable modi_cations for complement
166 notations are discussed near the end of this
174 section.|'!|9|4|1|1|1The most important fact
179 to understand about extended-precision numbers
184 is that they may be regarded as numbers written
193 in radix |εw |πnotation, where |εw |πis the computer's
202 word size. For example, an integer which _lls
210 10 words on a computer whose word size is |εw|4α=↓|410|g1|g0
219 |πhas 100 decimal digits; but we will consider
228 it to be a 10-place number to the base 10|g1|g0.
238 This viewpoint is justi_ed for the same reason
246 that we may convert, say, from binary to octal
255 notation, simply by grouping the bits together.
262 (See Eq. 4.1<5.)|'!|9|4|1|1|1In these terms,
268 we are given the following primitive operations
275 to work with:|'{A12}{I1.7H}|4a|β0)|9addition
279 or subtraction of one-place integers, giving
285 a one-place answer and a carry;|'|1|1|1b|β0)|9multiplication
291 of a one-place integer by another one-place
299 integer, giving a two-place answer;|'|4|1c|β0)|9division
305 of a two-place integer by a one-place integer,
313 provided that the quotient is a one-place integer,
321 and yielding also a one-place remainder.|'{IC}{A12}By
328 adjusting the word size, if necessary, nearly
335 all computers will have these three operations
342 available, and so we will construct our algorithms
350 (a), (b), and (c) mentioned above in terms of
359 the primitive operations (a|β0), (b|β0), and
365 (c|β0).|'!|9|4|1|1|1Since we are visualizing extended-precisi
370 on integers as base |εb |πnumbers, it is sometimes
379 helpful to think of the situation when |εb|4α=↓|410,
387 |πand to imagine that we are doing the arithmetic
396 by hand. Then operation (a|β0) is analogous to
404 memorizing the addition table; (b|β0) is analogous
411 to memorizing the multiplication table; and (c|β0)
418 is essentially memorizing the multiplication
423 table in reverse. The more complicated operations
430 (a), (b), (c) on high-precision numbers can now
438 be done using the simple addition, subtraction,
445 multiplication, and long division procedures
450 we are taught in elementary school. In fact,
458 most of the algorithms we shall discuss in this
467 section are essentially only mechanizations of
473 familiar pencil-and-paper operations. Of course,
478 we must state the algorithms much more precisely
486 than they have ever been stated in the _fth grade,
496 and we should also attempt to minimize computer
504 memory and running time requirements.|'!|9|4|1|1|1To
510 avoid a tedious discussion and cumbersome notations,
517 let us assume that all numbers we deal with are
527 |εnonnegative. |πThe additional work of computing
533 the signs, etc., is quite straightforward, and
540 the reader will _nd it easy to _ll in any details
551 of this sort.|'!|9|4|1|1|1First comes addition,
557 which of course is very simple, but it is worth
567 studying since the same ideas occur in the other
576 algorithms also:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m
579 |≡A (|εAddition of nonnegative integers).|9|4|πGiven
584 nonnegative |εn-|πplace integers |εu|β1u|β2|4.|4.|4.|4u|βn
588 |πand |εv|β1v|β2|4.|4.|4.|4v|βn |πwith radix
592 |εb, |πthis algorithm forms their sum, (|εw|β0w|β1w|β2|4.|4.
598 |4.|4w|βn)|βb. |π(Here |εw|β0 |πis the ``carry,''
604 and it will always be equal to 0 or 1.)|'{A3}{I1.9H}|≡A|≡1|≡
614 .|9[Initialize.] Set |εj|4|¬L|4n, k|4|¬L|40.
618 (|πThe variable |εj |πwill run through the various
626 digit positions, and the variable |εk |πkeeps
633 track of carries at each step.)|'{A3}|≡A|≡2|≡.|9[Add
640 digits.] Set |εw|βj|4|¬L|4(u|βj|4α+↓|4v|βj|4α+↓|4k)|πmod|4|ε
642 b, |πand |εk|4|¬L|4|"l(u|βj|4α+↓|4v|βj|4α+↓|4k)/b|"L.
645 (|πIn other words, |εk |πis set to 1 or 0, depending
656 on whether a ``carry'' occurred or not, i.e.,
664 whether |εu|βj|4α+↓|4v|βj|4α↓|4k|4|¬R|4b |πor
667 not. At most one carry is possible during the
676 two additions, since we always have|'{A9}|ε!!|1u|βj|4α+↓|4v|
682 βj|4α+↓|4k|4|¬E|4(b|4α_↓|41)|4α+↓|4(b|4α_↓|41)|4α+↓|41|4|¬W|
682 42b.)|;{A9}|π|≡A|≡3|≡.|9[Loop on |εj.] |πDecrease
687 |εj |πby one. Now if |εj|4|¬Q|40, |πgo back to
696 step A2; otherwise set |εw|β0|4|¬L|4k |πand terminate
703 the algorithm.|'{A12}{IC}For a formal proof that
710 Algorithm A is a valid, see exercise 4.|'!|9|4|1|1|1A
719 |¬m|¬i|¬x program for this addition process might
726 take the following form:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m
731 |≡A (|εAddition of nonnegative integers).|9|4|πLet
736 |¬l|¬o|¬c(|εu|βj)|4|"o|4|π|¬u|4α+↓|4|εj, |π|¬l|¬o|¬c(|εv|βj)
737 |4|"o|4|π|¬v|4α+↓|4|εj, |π|¬l|¬o|¬c(|εw|βj)|4|"o|4|π|¬w|4α+↓
738 |4|εj, |πrI1|4|¬o|4|εj, |πr|¬a|4|"o|4|εk, |πword
742 size|4|"o|4|εb, |π|¬n|4|"o|4|εn.|'{A12}{H9L11M24}|π|∂!!|∂!!|
744 ∂!!!!|∂!!!!|∂!!!!!!|∂!!!!!!!!!!!!!!|∂|E|;|ε|*/|>
746 |↔c|↔O|\|;|π|¬e|¬n|¬t|¬i|'|¬n|'1|;|ε|*/A|↔O|\.|9Initalize.|4j
750 |4|¬L|4n.|'>|>|*/|↔c|↔P|\|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
756 1|;|πEnsure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;
761 |π1|¬h|;|¬e|¬n|¬t|¬a|'|¬0|'|εN|4α+↓|41|4α_↓|4K|;
765 k|4|¬L|40.|'>|>|*/|↔c|↔M|\|;|π|¬j|¬i|¬z|'3|¬f|'
771 |εN|4α+↓|41|4α_↓|4K|;|πTo|4A3|4if|4|εj|4α=↓|40.|'
773 >|>|*/|↔c|↔C|\|;|¬2|π|¬h|;|¬a|¬d|¬d|'|¬u,|¬1|'
779 |εN|;|εA|*/|↔P|\.|9Add|4digits.|'>|>|*/|↔c|↔o|\|;
784 |π|;|¬a|¬d|¬d|'|¬v,|¬1|'|εN|;>|>|*/|↔c|↔p|\|;|π|¬s|¬t|¬a|'
792 |¬w,|¬1|'|εN|;>|>|*/|↔c|↔l|\|;|;|π|¬d|¬e|¬c|¬i|'
799 |¬1|'|εN|;A|↔L.|9Loop|4on|4j.|'>|>|*/|↔c|↔m|\|;
805 |;|π|¬j|¬n|¬o|¬v|'|¬1|¬b|'|εN|;|πIf|4no|4over⊗ow,|4set|4|εk|
809 4|¬L|40.|'>|>|*/|↔c|↔O|\|;|;|π|¬e|¬n|¬t|¬a|'|¬1|'
816 |εK|;|πOtherwise,|4set|4|εk|4|¬L|41.|'>|>|*/|↔O|↔O|\|;
821 |;|π|¬j|¬1|¬p|'|¬2|¬b|'|εK|;|πTo|4A2|4if|4|εj|4|=|↔6α=↓|40.|
825 '>|>|*/|↔O|↔P|\|;|π|¬3|¬h|;|¬s|¬t|¬a|'|¬w|'1|;
833 |πStore|4_nal|4carry|4in|4|εw|β0.|'>{A12}{H10L12M29}|πThe
836 running time for this program is 10|εN|4α+↓|46
843 |πcycles, independent of the number of carries,
850 |εK. |πThe quantity |εK |πis analyzed in detail
858 at the close of this section.|'!|9|4|1|1|1Many
865 modi_cations of Algorithm A are possible, and
872 only a few of these are mentioned in the exercises
882 below. A chapter on generalizations of this algorithm
890 might be entitled, ``How to design adding circuits
898 for a digital computer.''|'!|9|4|1|1|1The problem
904 of subtraction is similar to addition, but the
912 di=erences are worth noting:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|
916 ≡m |≡S (|εSubtraction of nonnegative integers).|9|4|π|πGiven
921 nonnegative |εn-|πplace integers |εu|β1u|β2|4.|4.|4.|4u|βn|
925 4|¬R|4v|β1v|β2|4.|4.|4.|4v|βn |πwith radix |εb,
929 |πthis algorithm forms their nonnegative di=erence,
935 (|εw|β1w|β2|4.|4.|4.|4w|βn)|βb.|'{A3}{I1.7H}|π|≡S|≡1|≡.|9[In
936 itialize.]|9Set |εj|4|¬L|4n, k|4|¬L|40.|'{A3}|π|≡S|≡2|≡.|9[S
939 ubtract digits.]|9Set |εw|βj|4|¬L|4(u|βj|4α_↓|4v|βj|4α+↓|4k)
941 |πmod |εb, |πand |εk|4|¬L|4|"l(u|βj|4α_↓|4v|βj|4α+↓|4k)/b|"L
944 . (|πIn other words, |εk |πis set to |→α_↓1 or
954 0, depending on whether a ``borrow'' occurred
961 or not, i.e., whether |εu|βj|4α_↓|4v|βj|4α+↓|4k|4|¬W|40
966 |πor not. In the calculation of |εw|βj |πnote
974 that we must have |→α_↓|εb|4α=↓|40|4α_↓|4(b|4α_↓|41)|4α↓|4(|
978 →α_↓1)|4|¬E|4u|βj|4α_↓|4v|βj|4α↓|4k|4|¬E|4(b|4α_↓|41)|4α_↓|4
978 0|4α⊗↓|40|4|¬W|4b; |πhence 0|4|¬E|4|εu|βj|4α_↓|4v|βj|4α+↓|4k
980 |4α+↓|4b|4|¬W|42b, |πand this suggests the method
986 of computer implementation explained below.)|'
991 {A3}|≡S|≡3|≡.|9[Loop on |εj.] |πDecrease |εj
996 |πby one. Now if |εj|4|¬Q|40, |πgo back to step
1005 S2; otherwise terminate the algorithm. (When
1011 the algorithm terminates, we should have |εk|4α=↓|40;
1018 |πthe condition |εk|4α=↓|4|→α_↓1 |πwill occur
1023 if and only if |εv|β1|4.|4.|4.|4v|βn|4|¬Q|4u|β1|4.|4.|4.|4u|
1027 βn, |πand this is contrary to the given assumptions.
1036 See exercise 12.)|'{A12}{IC}!|9|4|1|1|1In a |¬m|¬i|¬x
1042 program to implement subtraction, it is most
1049 convenient to retain the value 1|4α↓|4|εk |πinstead
1056 of |εk |πthroughout the algorithm, so that we
1064 can calculate |εu|βj|4α_↓|4v|βj|4α+↓|4(1|4α+↓|4k)|4α+↓|4(b|4
1066 α_↓|41) |πin step S2. (Recall that |εb |πis the
1075 word size.) This is illustrated in the following
1083 code:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m |≡S (|εSubtraction
1087 of nonnegative integers).|9|4|πThis program is
1092 analogous to Program A; we have rI1|4|"o|4|εj,
1099 |πrA|4|"o|41|4α+↓|4|εk. |πHere, as in other programs
1105 of this section, location |¬w|¬m|¬1 word; cf.
1112 Program 4.2.3D, lines 38<3|>|ε|*/|↔c|↔O|\|;|;|π|¬e|¬n|¬t|¬i|'
1119 |¬n|'|¬1|;|εS|*/|↔O|\.|9Initialize.|4j|4|¬L|4n.|'
1122 >|>|*/|↔c|↔P|\|;|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'|¬1|;
1129 Ensure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;|π|¬1|¬h|;
1134 |¬j|¬i|¬z|'|¬d|¬o|¬n|¬e|'|εK|4α+↓|41|;|πTerminate|4if|4|εj|4
1137 α=↓|40.|'>|>|*/|↔c|↔M|\|;|;|π|¬e|¬n|¬t|¬a|'|¬1|'
1144 |εK|;|πSet|4|εk|4|¬L|40.|'>|>|*/|↔c|↔C|\|;|π|¬2|¬h|;
1150 |¬a|¬d|¬d|'|¬u|¬,|¬1|'|εN|;S|*/|↔P|\.|9Subtract|4digits.|'
1154 >|>|*/|↔c|↔o|\|;|;|π|¬s|¬u|¬b|'|¬v|¬,|¬1|'|εN|;
1161 |πCompute|4|εu|βj|4α_↓|4v|βj|4α+↓|4k|4α+↓|4b.|'
1162 >|>|*/|↔c|↔p|\|;|;|π|¬a|¬d|¬d|'|¬w|¬m|¬1|'|εN|;
1169 >|>|*/|↔c|↔l|\|;|;|π|¬s|¬t|¬a|'|¬w|¬,|¬1|'|εN|;
1176 |π(May|4be|4minus|4zero.)|'>|>|*/|↔c|↔m|\|;|;|π|¬d|¬e|¬c|¬1|'
1182 |¬1|'|εN|;S|*/|↔L|\.|9Loop|4on|4j.|'>|>|*/|↔O|↔c|\|;
1188 |;|π|¬j|¬o|¬v|'|¬1|¬b|'|εN|;|πIf|4over⊗ow,|4set|4|εk|4|¬L|40
1192 .|'>|>|*/|↔O|↔O|\|;|;|π|¬e|¬n|¬t|¬a|'|¬0|'|εN|4α_↓|4K|;
1200 |πOtherwise,|4set|4|εk|4|¬L|4|→α_↓1.|'>|>|*/|↔O|↔P|\|;
1204 |;|π|¬j|¬i|¬p|'|¬2|¬b|'|εN|4α_↓|4K|;|πBack|4to|4S2.|'
1209 >|>|*/|↔O|↔L|\|;|;*?*?*?*?{U0}{H9L11M29}|πW58320#Computer
folio 347 galley 2
1214 Programming!(Knuth/Addision-Wesley)!f.347!Ch.4!G.2b.|'
1215 {A20}{H10L12M29}The running time for this program
1221 is 12|εN|4α+↓|43 |πcycles, which is slightly
1227 longer than that for Program A.|'!|9|4|1|1|1The
1234 reader may wonder if it would not be worth while
1244 to have a combined addition-subtraction routine
1250 in place of the two algorithms A and S. Study
1260 of the computer programs shows that it is generally
1269 better to use two di=erent routines, so that
1277 the inner loop of the computation can be performed
1286 as rapidly as possible, since the programs are
1294 so short.|'!|9|4|1|1|1Our next problem is multiplication,
1301 and here we carry the ideas used in Algorithm
1310 A a little further:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m
1315 |≡M (|εMultiplication of nonnegative integers).|9|4|πGiven
1320 nonnegative integers |εu|β1u|β2|4.|4.|4.|4u|βn
1323 |πand |εv|β1v|β2|4.|4.|4.|4v|βm |πwith radix
1327 |εb, |πthis algorithm forms their product |ε(w|β1w|β2|4.|4.|
1333 4.|4w|βm|βα+↓|βn)|βb. (|al products (|εu|β1u|β2|4.|4.|4.|4u|
1336 βn)|4α⊗↓|4v|βj |π_rst, for 1|4|¬E|4|εj|4|¬E|4m,
1340 |πand then adding these products together with
1347 appropriate scale factors; but in a computer
1354 it is best to do the addition concurrently with
1363 the multiplication, as described in this algorithm.)|'
1370 {A3}{I1.10H}|≡M|≡1|≡.|9[Initialize.] Set |εw|βm|βα+↓|β1,
1373 w|βm|βα+↓|β2,|4.|4.|4.|4,|4w|βm|βα+↓|βn |πall
1375 to zero. Set |εj|4|¬L|4m. (|πIf |εw|βm|βα+↓|β1,|4.|4.|4.|4,|
1380 4w|βm|βα+↓|βn |πwere not cleared to zero in this
1388 step, we would have a more general algorithm
1396 which sets|'{A9}|ε!!|1|1|1(w|β1|4.|4.|4.|4w|βm|βα+↓|βn)|4|¬L
1398 |4(u|β1|4.|4.|4.|4u|βn)|4α⊗↓|4(v|β1|4.|4.|4.|4v|βm)|4α+↓|4{H
1398 12}({H10}w|βm|βα+↓|β1|4.|4.|4.|4w|βm|βα+↓|βn).{H12}){H10}|;
1399 {A9}|π|≡M|≡2|≡.|9[Zero multiplier?] If |εv|βj|4α=↓|40,
1403 |πset |εw|βj|4|¬L|40 |πand go to step M6. (This
1411 test saves a good deal of time if there is a
1422 reasonable chance that |εv|βj |πis zero, but
1429 otherwise it may be omitted without a=ecting
1436 the validity of the algorithm.)|'{A3}|≡M|≡3|≡.|9[Initialize
1442 |εi.] |πSet |εi|4|¬L|4n, k|4|¬L|40.|'{A3}|π|≡M|≡4|≡.|9[Multi
1446 ply and add.] Set |εt|4|¬L|4u|βi|4α⊗↓|4v|βj|4α+↓|4w|βi|βα+↓|
1450 βj|4α+↓|4k; |πthen set |εw|βi|βα+↓|βj|4|¬L|4t
1454 |πmod |εb, k|4|¬L|4|"lt/b|"L. (|πHere the ``carry''
1460 |εk |πwill always be in the range 0|4|¬E|4|εk|4|¬W|4b;
1468 |πsee below.)|'{A3}|≡M|≡5|≡.|9[|πLoop on |εi].
1473 |πDecrease |εi |πby one. Now if |εi|4|¬Q|40,
1480 |πgo back to step M4; otherwise set |εw|βj|4|¬L|4k.|'
1488 {A3}|π|≡M|≡6|≡.|9[Loop on |εj.] |πDecrease |εj
1493 |πby one. Now if |εj|4|¬Q|40, |πgo back to step
1502 M2; otherwise the algorithm terminates.|'{A12}{IC}!|9|4|1|1|
1507 1Algorithm M is illusytr*?{A12}{IC}!|9|4|1|1|1Algorithm
1511 M is illustrated in Table 1, assuming that |εb|4α=↓|410,
1520 |πby showing the states of the computation at
1528 the beginning of steps M5 and M6. A proof of
1538 Algorithm M appears in the answer to exercise
1546 14.|'!|9|4|1|1|1The two inequalities|'{A9}|ε0|4|¬E|4t|4|¬E|4
1550 b|g2,!!0|4|¬E|4k|4|¬W|4b|J!(1)|;{A9}|πare crucial
1553 for an e∃cient implementation of this algorithm,
1560 since they point out how large a register is
1569 needed for the computations. These inequalities
1575 may be proved by induction as the algorithm proceeds,
1584 for if we have |εk|4|¬W|4b |πat the start of
1593 step M4, we have|'{A9}|εu|βi|4α⊗↓|4v|βj|4α+↓|4w|βi|βα+↓|βj|4
1597 α+↓|4k|4|¬E|4(b|4α_↓|41)|4α⊗↓|4(b|4α_↓|41)|4α+↓|4(b|4α_↓|41)
1597 |4α+↓|4(b|4α_↓|41)|4α=↓|4b|g2|4α_↓|41|4|¬W|4b|g2.|;
1598 {A12}{H9L11M15}{H8L10}|π|∨T|∨a|∨b|∨l|∨e|4|4|∨1|;
1599 {A3}{H9L11}MULTIPLICATION OF 914 BY 84.|;{A6}{H9L11M15}|∂!!!
1604 !|9|∂!|9|∂!|9|∂!|9|∂!|9|∂!!|∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂|E|;
1605 |π|>Step|;|εi|;j|;u|βi|;t|;w|β1|;w|β2|;w|β3|;
1614 w|β4|;w|β5|;>|π|>M5|;3|;2|;4|;4|;16|;|εx|;x|;
1626 0|;0|;6|;>|π|>M5|;2|;2|;1|;4|;05|;|εx|;x|;0|;
1640 5|;6|;>|π|>M5|;1|;2|;9|;4|;36|;|εx|;x|;6|;5|;
1654 6|;>|π|>M6|;0|;2|;|εx|;4|;36|;|εx|;3|;6|;5|;6|;
1668 >|π|>M5|;3|;1|;4|;8|;37|;|εx|;3|;6|;7|;6|;>|π|>
1683 M5|;2|;1|;1|;8|;17|;|εx|;3|;7|;7|;6|;>|π|>M5|;
1697 1|;1|;9|;8|;76|;|εx|;6|;7|;7|;6|;>|π|>M6|;0|;
1711 1|;|εx|;8|;76|;7|;6|;7|;7|;6|;>{A12}|π{H10L12M29}!|9|4|1|1|1
1721 The following |¬m|¬i|¬x program shows the considerations
1728 which are necessary when Algorithm M is implemented
1736 on a computer. The coding for step M4 would be
1746 a little simpler if our computer had a ``multiply-nad-add''
1755 instruction, or if it had a double-length accumulator
1763 for addition.|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m |≡M|≡.
1767 (|εMultiplication of nonnegative integers).|9|4|πThis
1771 program is annalogous to Program A. rI1|4|"o|4|εi,
1778 |πrI2|4|"o|4|εi|4α+↓|4j, |π|¬c|¬o|¬n|¬t|¬e|¬n|¬t|¬s|¬(|¬c|¬a
1779 |¬r|¬r|¬y|¬)|4|"o|4|εk.|'{A12}|π{H9L11M29}|π|∂!!|∂!!|∂!!!!|∂
1780 !!!!!|∂!!!!!!|∂!!!!!!!!!!!!!!!!!!!|4|4|4|∂|E|;
1781 |ε|>|*/|↔c|↔O|\|;|π|;|¬e|¬n|¬t|¬1|'|¬n|'1|;|εM|*/|↔O|\.|9Initi
1787 alize.|'>|>|*/|↔c|↔P|\|;|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
1794 1|;|πEnsure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;
1799 |;|π|¬s|¬t|¬z|'|¬wα+↓|¬m|¬,|¬1|'|εN|;w|βm|βα+↓|βi|4|¬L|40.|'
1804 >|>|*/|↔c|↔M|\|;|π|¬d|¬e|¬c|¬1|'|¬1|'|εN|;>|>|*/|↔c|↔C|\|;
1813 |;|π|¬j|¬1|¬p|'|≤∩|→α_↓|¬2|'|εN|;|πRepeat|4for|4|εn|4|¬R|4i|
1817 4|¬Q|40.|'>|>|*/|↔c|↔o|\|;|π|;|¬e|¬n|¬t|¬2|'|¬m|'
1824 1|;|εj|4|¬L|4m.|'>|>|*/|↔c|↔p|\|;|π|¬1|¬h|'|¬l|¬d|¬x|'
1831 |¬v|¬,|¬2|'|εM|;M|*/|↔P|\.|9Zero|4multiplier?|'
1834 >|>|*/|↔c|↔l|\|;|;|π|¬j|¬x|¬z|'|¬8|¬f|'|εM|;|πIf|4|εv|βj|4α=↓
1841 |40,|4|πset|4|εw|βj|4|¬L|40|4|πand|4go|4to|4M6.|'
1842 >|>|*/|↔c|↔m|\|;|;|π|¬e|¬n|¬t|¬1|'|¬n|'|εM|4α_↓|4Z|;
1849 M|*/|↔L|\.|9Initialize|4i.|'>|>|*/|↔O|↔c|\|;|;|π|¬e|¬n|¬t|¬3|'
1855 |¬n|¬,|¬2|'|εM|4α_↓|4Z|;i|4|¬L|4n,|4(i|4α+↓|4j)|4|¬L|4n|4α+↓
1857 |4j.|'>|>|*/|↔O|↔O|\|;|;|π|¬e|¬n|¬t|¬x|'|¬0|'|εM|4α_↓|4Z|;
1865 k|4|¬L|40.|'>|>|*/|↔O|↔P|\|;|¬2|¬h|;|π|¬s|¬t|¬x|'
1871 |¬c|¬a|¬r|¬r|¬y|'|ε(M|4α_↓|4Z)N|;M|*/|↔M|\.|9Multiply|4and|4a
1873 dd.|'>|>|*/|↔O|↔L|\|;|;|π|¬l|¬d|¬a|'|¬u|¬,|¬1|'
1880 |ε(M|4α_↓|4Z)N|;u|βi|'>|>|*/|↔O|↔M|\|;|π|¬m|¬u|¬l|'
1886 |¬v|¬,|¬2|'|ε(M|4α_↓|4Z)N|;α⊗↓|4v|βj|'>|>|*/|↔O|↔C|\|;
1892 |;|π|¬s|¬l|¬c|'|¬5|'|ε(M|4α_↓|4Z)N|;|πInterchange|4rA|4|"m|4
1896 rX.|'>|ε|>|*/|↔O|↔o|\|;|;|π|¬a|¬d|¬d|'|¬w|¬,|¬3|'
1903 |ε(M|4α_↓|4Z)N|;|πAdd|4|εw|βi|βα+↓|βj|4|πto|4lower|4half.|'
1905 >|>|*/|↔O|↔p|\|;|;|π|¬j|¬n|¬o|¬v|'{J3}|≤∩|→α↓2|'
1911 (|εM|4α_↓|4Z)N|;|πDid|4over⊗ow|4occur?|'>|ε|>
1915 |*/|↔O|↔l|\|;|;|π|¬i|¬n|¬c|¬x|'|¬1|'|εK|;|πIf|4so,|4carry|4on
1920 e|4into|4upper|4half.|'>|ε|>|*/|↔O|↔m|\|;|;|π|¬a|¬d|¬d|'
1926 |¬c|¬a|¬r|¬r|¬y|'|ε(M|4α_↓|4Z)N|;|πAdd|4|εk|4|πto|4lower|4ha
1928 lf.|'>|ε|>|*/|↔P|↔c|\|;|;|π|¬j|¬n|¬o|¬v|'{J3}|≤∩|→α+↓|¬2|'
1935 |ε(M|4α_↓|4Z)N|;|πDid|4over⊗ow|4occur?|'>|ε|>
1939 |*/|↔P|↔O|\|;|;|π|¬i|¬n|¬c|¬x|'|¬1|'|εK|¬S|;|πIf|4so,|4carry|
1944 4one|4into|4upper|4half.|'>|ε|>|*/|↔P|↔P|\|;|;
1949 |π|¬s|¬t|¬a|'|¬w|¬,|¬3|'|ε(M|4α_↓|4Z)N|;|εw|βi|βα+↓|βj|4|¬L|
1952 4t|4|πmod|4|εb.|'>|>|*/|↔P|↔L|\|;|;|π|¬d|¬e|¬c|¬1|'
1958 |¬1|'|ε(M|4α_↓|4Z)N|;M|*/|↔C|\.|9Loop|4on|4i.|'
1961 >|>|*/|↔P|↔M|\|;|;|π|¬d|¬e|¬c|¬3|'|¬1|'|ε(M|4α_↓|4Z)N|;
1968 |πDecrease|4|εi|4|πand|4(|εi|4α+↓|4j)|4|πby|4one.|'
1969 >|ε|>|*/|↔P|↔C|\|;|;|π|¬j|¬1|¬p|'|¬2|¬b|'|ε(M|4α_↓|4Z)N|;
1976 |πBack|4to|4M4|4if|4|εi|4|¬Q|40;|4|πrX|4α=↓|4|"l|εt/b|¬L.|'
1977 >|ε|>|*/|↔P|↔o|\|;|π|¬8|¬h|;|¬s|¬t|¬x|'|¬w|¬,|¬2|'
1983 |εM|;|πSet|4|εw|βj|4|¬L|4k.|'>|π|ε|>|*/|↔P|↔p|\|;
1988 |;|π|¬d|¬e|¬c|¬2|'|¬1|'|εM|;M|*/|↔o|\.|9Loop|4on|4j.|'
1993 >|>|*/|↔P|↔l|\|;|;|π|¬j|¬2|¬p|'|¬1|¬b|'|εM|;*?|πRepeat|4until|
2000 4|εj|4α=↓|40.|'>{A12}{H10L12M29}|πThe execution
2004 time of Program M depends on the number of places,
2014 |εM, |πin the multiplier; the number of places,
2022 |εN, |πin the multiplicand; the number of zeros,
2030 |εZ, |πin the multiplier; and the number of carries,
2039 |εK |πand |εK|¬S |πwhich occur during the addition
2047 to the lower half of the product in the computation
2057 of |εt. |πIf we approximate both |εK |πand |εK|¬S
2066 |πby the reasonable (although somewhat pessimistic)
2072 values |f1|d32|)(|εM|4α_↓|4Z)N, |πwe _nd that
2077 the total running time comes to 28|εMN|4α+↓|410M|4α+↓|44N|4α
2083 +↓|43|4α_↓|4Z(28N|4α+↓|43) |πcycles. If step
2087 M2 were deleted, the running time would be 28|εMN|4α+↓|47M|4
2095 α+↓|44N|4α+↓|43 |πcycles, so this step is not
2102 advantageous unless the density of zero positions
2109 within the multiplier is |εZ/M|4|¬Q|43/(28N|4α+↓|43).
2114 |πIf the multiplier is chosen completely at random,
2122 this ratio |εZ/M |πis expected to be only about
2131 1/|εb, |πwhich is extremely small; so step M2
2139 is generally |εnot |πworth while.|'!|9|4|1|1|1Algorithm
2145 M is not the fastest way to multiply when |εm
2155 |πand |εn |πare large, although it has the advantage
2164 of simplicity. Speedier methods are discussed
2170 in Section 4.3.3; even when |εm|4α=↓|4n|4α=↓|44,
2176 |πit is possible to multiply numbers in a little
2185 less time than is required by Algorithm M.|'{A12}!|9|4|1|1|1
2193 The _nal algorithm of concern to us in this section
2203 is long division, in which we want to divide
2212 (|εn|4α+↓|4m)-|πplace integers by |εn-|πplace
2216 integers. Here the ordinary pencil-and-paper
2221 method involves a certain amount of guesswork
2228 and ingenuity on the part of the person doing
2237 the division; we must either eliminate this guesswork
2245 from the algorithm or develop some theory to
2253 explain it more carefully.|'!|9|4|1|1|1A moment's
2259 re⊗ection about the ordinary process of long
2266 division shows that the general problem breaks
2273 down into simpler steps, each of which is the
2282 division of an (|εn|4α+↓|41)-|πplace number |εu
2288 |πby the |εn-|πplace divisor |εv, |πwhere 0|4|¬E|4|εu/v|4|¬W
2294 |4b; |πthe remainder |εr |πafter each step is
2302 less than |εv, |πso we may use |εrb|4α+↓|4(|πnext
2310 place of dividend) as the new |εu |πin the succeeding
2320 step. For example, if we are asked to divide
2329 3142 by 47, we _rst divide 314 by 47, gbe*?*?*?are
2339 asked to divide 3142 by 47, we _rst divide 314
2349 by 47, getting 6 and a remainder of 32; then
2359 we divide 322 by 47, getting 6 and a remainder
2369 of 40; thus we have a quotient of 66 and a remainder
2381 of 40. It is clear that this same idea works
2391 in general, and so our search for an appropriate
2400 division algorithm reduces to the following problem
2407 (Fig. 6);|'{A12}|ε!|9|4|1|1|1Let u|4α=↓|4u|β0u|β1|4.|4.|4.|4
2410 u|βn and v|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn be nonnegative
2415 integers in radix b notation, such that u/v|4|¬W|4b.
2423 Find an algorithm to determine q|4α=↓|4|"lu/v|"L.|'
2429 {A6}{H9L11}|π|≡F|≡i|≡g|≡. |≡6|≡.|9|4Wanted: a
2432 way to determine |εq |πrapidly.|{U0}{H9L11M29}|πW58320#Compu
folio 350 galley 3 WARNING: Much of this tape unreadable!
2436 ter Programming!(Knuth/Addision-Wesley)!f.350!Ch.4!g.3b.|'
2438 {A20}{H10L12M29}|πWe may observe that the condition
2444 |εu/v|4|¬W|4b |πis equivalent to the condition
2450 that |εu/b|4|¬Q|4v; |πi.e., |"l|εu/b|"L|4|¬W|4v;
2454 |πand this is the condition that |εu|β0u|β1|4.|4.|4.|4u|βn|β
2460 α_↓|β1|4|¬W|4v|β1v|β2|4.|4.|4.|4v|βn. |πFurthermore,
2462 if we write |εr|4α=↓|4u|4α_↓|4qv, |πthen |εq
2468 |πis the unique integer such that 0|4|¬E|4|εr|4|¬E|4v.|'
2475 |π!|9|4|1|1|1The most obvious approach to this
2481 problem is to make a guess about |εq, |πbased
2490 on the most signi_cant digits of |εu |πand |εv.
2499 |πIt isn't obvious that such a method will be
2508 reliable enough, but it is worth investigating;
2515 let us therefore set|'{A9}|ε|=7q|4α=↓|4|πmin|↔a|↔d|ε|(u|β0b|
2519 4α+↓|4u|β1|d2v|β1|)|↔f,|4b|4α_↓|41|↔s.|J!(2)|;
2520 {A9}|πThis {A9}|πThis formula says |ε|=7q |πis
2526 obtained by dividing the two leading digits of
2534 |εu |πby the leading digit of |εv; |πand if the
2544 result is |εb |πor more we can replace it by
2554 (|εb|4α_↓|41).|'!|9|4|1|1|1|πIt is a remarkable
2559 fact, which we will now investigate, that this
2567 value |ε|=7q |πis always a very good approximation
2575 to the desired answer |εq, |πso long as |εv|β1
2584 |πis reasonably large. In order to analyze how
2592 close |ε|=7q |πcomes to |εq, |πwe will _rst prove
2601 that |ε|=7q |πis never too small.|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡
2607 m |≡A|≡.|9|4|εIn the notation above, |=7q|4|¬R|4q.|'
2613 {A12}|π|εProof.|9|4|πSince |εq|4|¬E|4b|4α_↓|41,
2615 |πthe theorem is certainly true if |ε|=7q|4α=↓|4b|4α_↓|41.
2622 |πSuppose therefore that |ε|=7q|4|¬W|4b|4α_↓|41;
2626 |πit follows that |ε|=7q|4α=↓|4|"l(u|β0b|4α+↓|4u|β1)/v|β1|"L
2629 , |πhence |ε|=7qv|β1|4|¬R|4u|β0b|4α+↓|4u|β1|4α_↓|4v|β1|4α+↓|
2631 41. |πTherefore|'{A9}|ε|h|εu|4α_↓|4qv|4|¬E|4u|4α_↓|4qv|β1b|g
2633 n|gα_↓|g1|4|∂|¬E|4u|β2b|gn|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|
2633 4u|βn|4α_↓|4b|gn|gα_↓|g1|4α+↓|4v|β1b|gn|gα_↓|g1|4|¬W|4v|β1b|
2633 gn|gα_↓|g1|4|¬E|4v.|E|n|;| u|4α_↓|4|=7qv|4|¬E|4u|4α_↓|4|=7qv
2634 |β1b|gn|gα_↓|g1|4|L|¬E|4u|β0b|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u
2634 |βn>{A4}|L|4|9|1|4|1|1|1|4α_↓|4(u|β0b|gn|4α+↓|4u|β1b|gn|gα_↓
2635 |g1|4α_↓|4v|β1b|gn|gα_↓|g1|4α+↓|4b|gn|gα_↓|g1)>
2636 {A4}|L|4α=↓|4u|β2b|gn|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βn
2636 |4α_↓|4b|gn|gα_↓|g1|4α+↓|4v|l1b|gn|gα_↓|g1|4|¬W|4v|β1b|gn|gα
2636 _↓|g1|4|¬E|4v.>{A9}|πSince |εu|4α_↓|4|=7qv|4|¬W|4v,
2639 |πwe must have |ε|=7q|4|¬R|4q.|'{A12}|π!|9|4|1|1|1We
2644 will now prove that |ε|=7q |πcannot be much larger
2653 than |εq |πin practical situations. Assume that
2660 |ε|=7q|4|¬R|4q|4α↓|43. |πWe have|'{A9}|ε|=7q|4|¬E|4|(u|β0b|4
2663 α+↓|4u|β1|d2v|β1|)|4α=↓|4|(u|β0b|gn|4α+↓|4u|β1b|gn|gα_↓|g1|d
2663 2v|β1b|gn|gα_↓|g1|)|4|¬E|4|(u|d2v|β1b|gn|gα_↓|g1|)|4|¬W|4|(u
2663 |d2v|4α_↓|4b|gn|gα_↓|g1|).|;{A9}|π(The case |εv|4α=↓|4b|gn|g
2666 α_↓|g1 |πis impossible, for if |εv|4α=↓|4(100|4|¬O|4|¬O|4|¬O
2671 |40)|βb |πthen |εq|4α=↓|4|=7q.) |πFurthermore,
2675 since |εq|4|¬Q|4(u/v)|4α_↓|41,|'{A9}|ε3|4|¬E|4|=7q|4α_↓|4q|4
2677 |¬W|4|(u|d2v|4α_↓|4b|gn|gα_↓|g1|)|4α_↓|4|(u|d2v|)|4α+↓|41|4α
2677 =↓|4|(u|d2v|)|4|↔a|(b|gn|gα_↓|g1|d2v|4α_↓|4b|gn|gα_↓|g1|)|↔s
2677 |4α+↓|41.|;{A9}|πTherefore|'{A9}|ε|(u|d2v|)|4|¬Q|42|4|↔a|(v|
2679 4α_↓|4b|gn|gα_↓|g1|d2b|gn|gα_↓|g1|)|↔s|4|¬R|42(v|β1|4α_↓|41)
2679 .|;{A9}|πFinally, since |εb|4α_↓|44|4|¬R|4|=7q|4α_↓|43|4|¬R|
2682 4q|4α=↓|4|"lu/v|"L|4|¬R|42(v|β1|4α_↓|41), |πwe
2684 have |εv|β1|4|¬W|4|"lb/2|"L. |πThis proves Theorem
2689 B:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡B|≡.|9|4|εIf
2692 v|β1|4|¬R|4|"lb/2|"L, then |=7q|4α_↓|42|4|¬E|4q|4|¬E|4|=7q.|
2694 '{A12}|π!|9|4|1|1|1The most important part of
2700 this theorem is that |εthe conclusion is independent
2708 of b; |πno matter how large |εb |πis, the trial
2718 quotient |ε|=7q |πwill never be more than 2 in
2727 error*3|'!|9|4|1|1|1The condition that |εv|β1|4|¬R|4|"lb/2|"L
2731 |πis very much like a normalization condition
2739 (in fact, it is exactly the condition of normalization
2748 in a binary computer). One simple way to ensure
2757 that |εv|β1 |πis su∃ciently large is to multiply
2765 |εboth u |πand |εv |πby |"l|εb/(v|β1|4α+↓|41)|"L;
2771 |πthat does not change the value of |εu/v, |πnor
2780 does it increase the number of places in |εv,
2789 |πand exercise 23 proves that it will always
2797 make the new value of |εv|β1 |πlarge enough.
2805 (|εNote|*/: |\|πFor another way to normalize the
2812 divisor, see exercise 28.)|'!|9|4|1|1|1Now that
2818 we have armed ourselves with all of these facts,
2827 we are in a position to write the desired long
2837 division algorithm. This algorithm uses a slightly
2844 improved choice of |ε|=7q |πin step D3 which
2852 guarantees that |εq|4α=↓|4|=7q |πor |ε|=7q|4α_↓|41;
2857 |πin fact, the improved choice of |ε|=7q |πmade
2865 here is almost always accurate.|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|
2870 ≡h|≡m |≡D (|εDivision of nonnegative integers).|9|4|πGiven
2876 nonnegative integers |εu|4α=↓|4u|β1u|β2|4.|4.|4.|4u|βm|βα+↓|
2878 βn |πand |εv|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn |πwith
2882 radix |εb, |πwhere |εv|β1|4|=|↔6α=↓|40 |πand
2887 |εn|4|¬W|41, |πwe form the quotient |"l|εu/v|"L|4α=↓|4(q|β0q
2892 |β1|4.|4.|4.|4q|βm)|βb |πand the remainder |εu
2897 |πmod |εv|4α=↓|4(r|β1r|β2|4.|4.|4.|4r|βn)|βb.
2899 (|πThis notation is slightly di=erent from that
2906 used in the above proofs. When |εn|4α=↓|41, |πthe
2914 simpler algorithm of exercise 16 should be used.)|'
2922 {A3}{I1.9H}|≡D|≡1|≡.|9[Normalize.] |πSet |εd|4|¬L|4|"lb/(v|β
2924 1|4α+↓|41)|"L. |πSet |εu|β0u|β1u|β2|4.|4.|4.|4u|βm|βα+↓|βn
2927 |πequal to |εu|β1u|β2|4.|4.|4.|4u|βm|βα+↓|βn
2930 |πtimes |εd. |πSet |εv|β1v|β2|4.|4.|4.|4v|βn
2934 |πequal to |εv|β1v|β2|4.|4.|4.|4v|βn |πtimes
2938 |εd. (|πNote the introduction of the new digit
2946 position |εu|β0 |πat the left of |εu|β1; |πif
2954 |εd|4α=↓|41, |πall we need to do in this step
2963 is to set |εu|β0|4|¬L|40. |πOn a binary computer
2971 it may be preferable to choose |εd |πto be a
2981 power of 2 instead of using the value suggested
2990 here; any value of |εd |πwhich results in |εv|β1|4|¬R|4|"lb/
2998 2|"L |πwill su∃ce here.)|'{A3}|≡D|≡2|≡.|9[Initialize
3003 |εj.] |πSet |εj|4|¬L|40. (|πThe loop on |εj,
3010 |πsteps D2 through D7, will be essentially a
3018 division of |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn
3021 |πby |εv|β1v|β2|4.|4.|4.|4v|βn |πto get a single
3027 quotient digit |εq|βj; |πcf. Fig. 6.)|'{A3}|≡D|≡3|≡.|9[Calcu
3033 late |ε|=7q.] |πIf |εu|βj|4α=↓|4v|β1, |πset |ε|=7q|4|¬L|4b|4
3038 α_↓|41; |πotherwise set |ε|=7q|4|¬L|4|"l(u|βjb|4α+↓|4u|βj|βα
3041 +↓|β1)/v|β1|"L. |πNow test if |εv|β2|=7q|4|¬Q|4(u|βjb|4α+↓|4
3045 u|βj|βα+↓|β1|4α_↓|4|=7qv|β1)b|4α+↓|4u|βj|βα+↓|β2;
3046 |πif so, decrease |ε|=7q |πby 1 and repeat this
3055 test. (The latter test determines at high speed
3063 most of the cases in which the trial value |ε|=7q
3073 |πis one too large, and it eliminates |εall |πcases
3082 where |ε|=7q |πis two too large; see exercises
3090 19, 20, 21.)|'{A3}|≡D|≡4|≡.|9[Multiply and subtract.]
3096 Replace |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn
3098 |πby |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn
3100 |πminus (|ε|=7q |πtimes |εv|β1v|β2|4.|4.|4.|4v|βn).
3104 |πThis step (analogous to steps M3 to M5 of Algorithm
3114 M) consists of a simple multiplication by a one-place
3123 number, combined with a subtraction. The digits
3130 |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα↓|βn |πshould
3132 be kept positive; if the result of this step
3141 is actually negative, |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓
3144 |βn |πwhould be left as the true value plus |εb|gn|gα+↓|g1,
3154 |πi.e., as the |εb'|πs complement of the true
3162 value, and a ``borrow'' to the left should be
3171 remembered.|'{A6}{H9L11}|≡F|≡i|≡g|≡. |≡7|≡.|9|4Long
3174 division.|;{A6}{H10L12}|≡D|≡5|≡.|9[Test remainder.]
3177 Set |εq|βj|4|¬L|4|=7q. |πIf the result of step
3184 D4 was negative, go to step D6; otherwise go
3193 on to step D7.|'{A3}|≡D|≡6|≡.|9[Add back.] (The
3200 probability that this step is necessary is very
3208 small, on the order of only |ε3/b, |πsee exercise
3217 21; test data which activates this step should
3225 therefore be speci_cally continued when debugging.)
3231 Decrease |εq|βj |πby 1, and add |ε0v|β1v|β2|4.|4.|4.|4v|βn
3238 |πto |εu|βju|βj|βα+↓|β1u|βj|βα+↓|β2|4.|4.|4.|4u|βj|βα+↓|βn.
3240 (|πA carry will occur to the left of |εu|βj,
3249 |πand it should be ignored since it cancels with
3258 the ``borrow'' which occurred in D4.)|'{A3}|≡D|≡7|≡.|9[Loop
3265 on |εj.] |πIncrease |εj |πby one. Now if |εj|4|¬E|4m,
3274 |πgo back to D3.|'{A3}|≡D|≡8|≡.|9[Unnormalize.]
3279 Now |εq|β0q|β1|4.|4.|4.|4q|βm |πis the desired
3284 quotient, and the desired remainder may be obtained
3292 by dividing |εu|βm|βα+↓|β1|4.|4.|4.|4u|βm|βα+↓|βn
3295 |πby |εd.|'{A12}{IC}!|9|4|1|1|1|πThe representation
3299 of Algorithm D as a |¬m|¬i|¬x program has several
3308 points of interest:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m
3312 |≡D (|εDivision of nonnegative integers).|9|4|πThe
3317 conventions of this program are analogous to
3324 Program A; rI1|4|"o|4|εi, |πrI2|4|"o|4|εj|4α_↓|4m,
3328 |πrI3|4|"o|4|εi|4α+↓|4j. |πSteps D1 and D8 have
3334 been left as exercises.|'{A12}{H9L11M33}|∂!!!|∂!!|9|∂!!!|9|∂
3338 !!!!!!|9|∂!!!!!!!!!|∂!!!!!!!!!!!!!!!!!!!!|∂|E|;
3339 |ε|>|*/|↔c|↔c|↔O|\|'|π|¬d|¬1|'|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
3344 1|;|εD|*/|↔O|\.|9Normalize.|'>|>|¬O|4|¬O|4|¬O|'
3349 |;|;|;|;|π(See|4exercise|425)|'>|ε|>|*/|↔c|↔L|↔m|\|'
3357 |π|¬d|¬2|'|¬e|¬n|¬n|¬2|'|¬m|'1|;|εD|*/|↔P|\.|9Initialize|4j.|
3361 '>|>|*/|↔c|↔M|↔c|\|'|;|π|¬s|¬t|¬z|'|¬v|'1|;Set|4|εv|β0|4|¬L|4
3369 0,|4|πfor|4convenience|4in|4D4.|'>|>|*/|↔c|↔M|↔O|\|'
3373 |¬d|¬3|'|¬l|¬d|¬a|'|¬u|≤%|¬m|¬,|¬2|¬(|¬1|1|1|¬.|1|1|¬5|¬)|'
3376 |εM|4α+↓|41|;D|*/|↔L|\.|9Calculate|4|=7q.|'>|>
3380 |*/|↔c|↔M|↔P|\|'|;|π|¬l|¬d|¬x|'|¬u|≤%|¬m|≤%|¬1|¬,|¬2|'
3384 |εM|4α+↓|41|;|πrAX|4|¬L|4|εu|βjb|4α+↓|4u|βj|βα+↓|β1.|'
3386 >|ε|>|*/|↔c|↔M|↔L|\|'|;|π|¬d|¬i|¬v|'|¬v|≤%|¬1|'
3392 |εM|4α+↓|41|;|πrA|4|¬L|4|"lrAX/|εv|β1|"L.|'>|ε|>
3396 |*/|↔c|↔M|↔M|\|'|;|π|¬j|¬o|¬v|'|¬1|¬f|'|εM|4α+↓|41|;
3401 |πJump|4if|4quotient|4α=↓|4|εb.|'>*?|>|*/|↔c|↔M|↔C|\|'
3405 |;|π|¬s|¬t|¬a|'|¬q|¬h|¬a|¬t|'|εM|4α+↓|41|;|=7q|4|¬L|4|πrA.|'
3410 >|ε|>|*/|↔c|↔M|↔o|\|'|;|π|¬s|¬t|¬x|'|¬r|¬h|¬a|¬t|'
3416 |εM|4α+↓|41|;|=7r|4|¬L|4u|βjb|4α+↓|4u|βj|βα+↓|β1|4α_↓|4|=7qv
3417 |β1|'>|>|*/|↔c|↔M|↔p|\|'|;|π|¬j|¬m|¬p|'|¬2|¬f|'
3424 |εM|4α+↓|41|;!!α/↓|4(u|βjb|4α+↓|4u|βj|βα+↓|β1)|πmod|4|εv|β1.
3425 |'>|ε|>|*/|↔c|↔M|↔l|\|'|π|¬1|¬h|'|¬l|¬d|¬x|'|¬w|¬m|¬1|'
3432 |;|πrX|4|¬L|4|εb|4α_↓|41.|'>|ε|>|*/|↔c|↔M|↔m|\|'
3437 |;|π|¬l|¬d|¬a|'|¬u|≤%|¬m|≤%|¬1|¬,|¬2|'|;|πrA|4|¬L|4|εu|βj|βα
3441 +↓|β1.|9(|πHere|4|εu|βj|4α=↓|4v|β1.)|'>|>|*/|↔c|↔C|↔c|\|'
3445 |π|;|¬j|¬m|¬p|'|¬4|¬f|'|;>|>|ε|*/|↔c|↔C|↔O|\|'
3452 |π|¬3|¬h|'|¬l|¬d|¬x|'|π|¬q|¬h|¬a|¬t|'|εE|;>*2*?*?¬q|¬h|¬a|¬t|'
3458 |ε(N|4α+↓|41)(M|4α+↓|41)|;|πrAX|4|¬L|4|→α_↓|ε|=7qv|βi.|'
3460 >|ε|>|*/|↔c|↔p|↔O|\|'|π|;|¬s|¬l|¬c|'|¬5|'|ε(N|4α+↓|41)(M|4α+↓
3466 |41)|;|πInterchange|4rA|4|"m|4rX.|'>|ε|>|*/|↔c|↔p|↔P|\|'
3471 |;|π|¬a|¬d|¬d|'|¬c|¬a|¬r|¬r|¬y|'|ε(N|4α+↓|41)(M|4α+↓|41)|;
3475 |πAdd|4the|4contribution|4from|4the|'>|ε|>|*/|↔c|↔p|↔L|\|'
3479 |π|;|¬j|¬n|¬o|¬v|'{J3}|≤∩|→|≤%|¬2|'|ε(N|4α↓|41)(M|4α+↓|41)|;
3483 |π!!digit|4to|4the|4right,|4plus|41.|'>|ε|>|*/|↔c|↔p|↔M|\|'
3487 |π|;|¬d|¬e|¬c|¬x|'|¬1|'|εK|;|πIf|4sum|4is|4|¬E|4|→α_↓|εb,|4|
3491 πcarry|4|→α_↓1.|'>|ε|>|*/|↔c|↔p|↔p|\|'|π|;|¬a|¬d|¬d|'
3497 |¬u|¬,|¬3|'|ε(N|4α+↓|41)(M|4α+↓|41)|;|πAdd|4|εu|βi|βα+↓|βj.|
3499 '>|ε|>|*/|↔c|↔p|↔o|\|'|π|;|¬a|¬d|¬d|'|¬w|¬m|¬1|'
3506 |ε(N|4α+↓|41)(M|4α+↓|41)|;|πAdd|4|εb|4α_↓|41|4|πto|4force|4α
3507 +↓|4sign.|'>|ε|>|*/|↔c|↔p|↔p|\|'|π|;|¬j|¬n|¬o|¬v|'
3513 {J3}|≤∩|→|≤%|¬2|'|ε(N|4α+↓|41)(M|4α⊗↓|41)|'|πIf|4no|4over⊗ow
3515 ,|4carry|4|→α_↓1.|'>|ε|>|*/|↔c|↔p|↔l|\|'|π|;|¬i|¬n|¬c|¬x|'
3521 |¬1|'|εK|¬S|;|πrX|4|"o|4carry|4|→α+↓1.|'>|ε|>
3526 |*/|↔c|↔p|↔m|\|'|π|;|¬s|¬t|¬a|'|¬u|¬,|¬3|'(|εN|4α+↓|41)(M|4α+
3530 ↓|41)|;|π|εu|βi|βα+↓|βj|4|¬L|4|πrA|4(may|4be|4minus|4zero).|
3531 '>|ε|>|*/|↔c|↔l|↔c|\|'|π|;|¬d|¬e|¬c|¬1|'|¬1|'|ε(N|4α+↓|41)(M|
3538 4α+↓|41)|;>|>|*/|↔c|↔l|↔O|\|'|π|;|¬d|¬e|¬c|¬3|'
3544 |¬1|'|ε(N|4α+↓|41)(M|4α+↓|41)|;>|>|*/|↔c|↔p|↔P|\|'
3549 |π|;|¬j|¬1|¬n|¬n|'|¬2|¬b|'|ε(N|4α+↓|41)(M|4α+↓|41)|;
3553 |πRepeat|4for|4|εn|¬R|4i|4|¬R|40.|'>|ε|>|*/|↔c|↔l|↔L|\|'
3557 |π|¬d|¬5|'|¬l|¬d|¬a|'|¬q|¬h|¬a|¬t|'|εM|4α+↓|41|;
3561 |εD|*/|↔C|\.|9Test|4remainder.|'>|>|*/|↔c|↔l|↔M|\|'
3565 |π|;|¬s|¬t|¬a|'|¬q|≤%|¬m|¬,|¬2|'|εM|4α+↓|41|;
3569 |πSet|4|εq|βj|4|¬L|4|=7q.|'>|>|*/|↔c|↔l|↔C|\|'
3573 |π|;|¬j|¬x|¬p|'|¬d|¬7|'|εM|4α+↓|41|;|π(Here|4rX|4α=↓|40|4or|
3577 41,|4since|4|εv|β0|4α=↓|40.)|'>|ε|>|*/|↔c|↔l|↔o|\|'
3581 |π|¬d|¬6|'|¬d|¬e|¬c|¬a|'|¬1|'|;|εD|*/|↔o|\.|9Add|4back.|'
3586 >|ε|>|*/|↔c|↔l|↔p|\|'|π|;|¬s|¬t|¬a|'|¬q|≤%|¬m|¬,|¬2|'
3592 |;|πSet|4|εq|βj|4|¬L|4|=7q|4α_↓|41.|'>|ε|>|*/|↔c|↔l|↔l|\|'
3597 |π|;|¬e|¬n|¬t|¬1|'|¬n|'|;|εi|4|¬L|4n.|'>|ε|>|*/|↔c|↔l|↔m|\|'
3605 |π|;|¬e|¬n|¬t|¬3|'|¬m|≤%|¬n|¬,|¬2|'|;|ε(i|4α+↓|4j)|4|¬L|4n|4
3609 α+↓|4j.|'>|ε|>|*/|↔c|↔m|↔c|\|'|π|¬1|¬h|'|¬e|¬n|¬t|¬a|'
3615 |¬0|'|;|π(This|4is|4essentially|4Program|4A.)|'
3618 >|ε|>|*/|↔c|↔m|↔O|\|'|π|¬2|¬h|'|¬a|¬d|¬d|'|¬u|¬,|¬3|'
3624 >|ε|>|*/|↔c|↔m|↔P|\|'|π|;|¬a|¬d|¬d|'|¬v|¬,|¬1|'
3630 >|ε|>|*/|↔c|↔m|↔L|\|'|π|;|¬s|¬t|¬a|'|¬u|¬,|¬3|'
3636 >|ε|>|*/|↔c|↔m|↔M|\|'|π|;|¬d|¬e|¬c|¬1|'|¬1|'>|ε|>
3644 |*/|↔c|↔m|↔C|\|'|π|;|¬d|¬e|¬c|¬3|'|¬1|'>|ε|>|*/|↔c|↔m|↔o|\|'
3651 |π|;|¬j|¬n|¬o|¬v|'|¬1|¬b|'>|ε|>|*/|↔c|↔m|↔p|\|'
3657 |π|;|¬e|¬n|¬t|¬a|'|¬1|'>|ε|>|*/|↔c|↔m|↔l|\|'|π|;
3664 |¬j|¬1|¬p|'|¬2|¬b|'|;|π(Not|4necessary|4to|4add|4to|4|εu|βj.
3667 )|'>|ε|>|*/|↔c|↔m|↔m|\|'|π|¬d|¬7|'|¬i|¬n|¬c|¬2|'
3673 |¬1|'|εM|4α+↓|41|;|εD|*/|↔p|\.|9Loop|4on|4j.|'
3676 >|ε|>|*/|↔O|↔c|↔c|\|'|;|π|¬j|¬2|¬n|¬p|'|¬d|¬3|'
3682 |εM|4α+↓|41|;|πRepeat|4for|40|4|¬E|4|εj|4|¬E|4m.|'
3684 >|ε|>|*/|↔O|↔c|↔O|\|'|π|¬d|¬8|'|¬O|4|¬O|4|¬O|'
3689 |;|;(See|4exercise|426)|'>|Hβ{U0}{H9L11M29}|πW58320#Computer
folio 354 galley 4
3693 Programming!(Knuth/Addision-Wesley)!f.354!Ch.4!g.4b.|'
3695 {A20}{H10L12M29}!|9|4|1|1|1Note how easily the
3699 rather complex appearing calculations and decisions
3705 of step D3 can be handled inside the machine.
3714 Note also that the program for step D4 is analogous
3724 to Program M, except that the ideas of Program
3733 S have also been incorporated. In step D6, use
3742 has been made of the fact that |εv|β0|4α=↓|40,
3750 |πand that |εu|βj |πis not needed in the subsequent
3759 calculations; a strict interpretation of Algorithm
3765 D would require line 098 to be ``|¬j|¬i|¬n|¬n
3773 |¬2|¬b.''|'!|9|4|1|1|1The running time for Program
3779 D can be estimated by considering the quantities
3787 |εM, N, E, K, |πand |εK|¬S |πshown in the program.
3797 (These quantities ignore several situations which
3803 can only occur with very low probability; for
3811 example, we may assume that lines 048<050, 063<064,
3819 and step D6 are never executed.) Here |εM|4α+↓|41
3827 |πis the number of words in the quotient; |εN
3836 |πis the number of words in the divisor; |εE
3845 |πis the number of times |ε|=7q |πis adjusted
3853 downwards in step D3; |εK |πand |εK|¬S |πare
3861 the number of times certain ``carry'' adjustments
3868 are made during the multiply-subtract loop. If
3875 we assume that |εK|4α+↓|4K|¬S |πis approximately
3881 |ε(N|4α+↓|41)(M|4α+↓|41), |πand that |εE |πis
3886 approximately |f1|d32|)|εM, |πwe get a total
3892 running time of approximately|'{A9}|ε30MN|4α+↓|430N|4α+↓|489
3896 M|4α+↓|4111|;{A9}|πcycles, plus 67|εN|4α+↓|4235M|4α+↓|44
3900 |πmore if |εd|4|¬Q|41. (|πThe program segments
3906 of exercises 25 and 26 are included in these
3915 totals.) When |εM |πand |εN |πare large, this
3923 is only about seven percent longer than the time
3932 Program M takes to multiply the quotient by the
3941 divisor.|'!|9|4|1|1|1Further commentary on Algorithm
3946 D appears in the exercises at the close of this
3956 section.|'{A12}!|9|4|1|1|1It is possible to debug
3962 programs for multiple-precision arithmetic by
3967 using the multiplication and addition routines
3973 to check the result of the division routine,
3981 etc. The following type of test data is occasionally
3990 useful:|'{A9}|ε(t|gm|4α_↓|41)(t|gn|4α_↓|41)|4α=↓|4t|gm|gα+↓|
3991 gn|4α_↓|4t|gn|4α_↓|4t|gm|4α+↓|41.|;{A9}|πIf |εm|4|¬W|4n,
3994 |πthis number has the radix |εt |πexpansion|'
4001 {A9}|ε|((t|4α_↓|41)!|¬O|4|¬O|4|¬O!(t|4α_↓|41)|d5m|4α_↓|41|4|
4001 πplaces|)|ε!|((t|4α+↓|42)|d5!|)!|((t|4α_↓|41)!|¬O|4|¬O|4|¬O!
4001 (t|4α_↓|41)|d5n|4α_↓|4m|4|πplaces|)!|(0!|¬O|4|¬O|4|¬O!0!1;|d
4001 5|εm|4α_↓|41|4|πplaces|)|;{A9}|πfor example,
4004 (10|g3|4α_↓|41)(10|g5|4α_↓|41)|4α=↓|499899001.
4005 In the case of Program D, it is also necessary
4015 to _nd some test cases which cause the rarely
4024 executed parts of the program to be used; some
4033 portions of that program would probably never
4040 get tested even if a million random test cases
4049 were tried.|'!|9|4|1|1|1Now that we have seen
4056 how to operate with signed-magnitude numbers,
4062 let us consider what approach should be taken
4070 to the same problems when a computer with complement
4079 notation is being used. For two's complement
4086 and one's complement notations, it is best to
4094 let the radix |εb |πbe |εone-half |πthe word
4102 size; thus for a 32-bit computer word we would
4111 use |εb|4α=↓|42|g3|g1 |πin the above algorithms.
4117 The sign bit of all but the most signi_cant word
4127 of a multiple-precision number will be zero,
4134 so that no anomalous sign correction takes place
4142 during the computer's multiplication and division
4148 operations. In fact, the basic meaning of complement
4156 notation requires that we consider all but the
4164 most signi_cant word to be nonnegative: For example,
4172 assuming a 10-bit word, the two's complement
4179 number|'{A9}1101111110!!111111010!!011101011|;
4181 {A9}(where the sign is given only for the most
4190 signi_cant word) is properly thought of as|'{A9}|→α_↓2|g2|g7
4197 |4α+↓|4(101111110)|β2|4|¬O|42|g1|g8|4α+↓|4(11111010)|β2|4|¬O
4197 |42|g9|4α+↓|4(011101011)|β2.|;{A9}|π!|9|4|1|1|1Addition
4199 of signed numbers is slightly easier when complement
4207 notations are being used, since the routine for
4215 adding |εn-|πplace nonnegative integers can be
4221 used for arbitrary |εn-|πplace integers; the
4227 sign appears only in the _rst word, so the less
4237 signi_cant words may be added together irrespective
4244 of the actual sign. (Special attention must be
4252 given to the leftmost carry when ones' complement
4260 notation is being used, however; it must be added
4269 into the least signi_cant word, and possibly
4276 propagated further to the left.) Similarly, we
4283 _nd that subtraction of signed numbers is slightly
4291 simpler with complement notation. On the other
4298 hand, multiplication and division seem to be
4305 done most easily by working with nonnegative
4312 quantities and doing suitable complementation
4317 operations beforehand to make sure both operands
4324 are nonnegative; it may be possible to avoid
4332 this complementation by devising some tricks
4338 for working directly with negative numbers in
4345 a complement notation, and it is not hard to
4354 see how this could be done in double-precision
4362 multiplication, but care should be taken not
4369 to slow down the inner loops of the subroutines
4378 when high precision is required. Note that the
4386 product of two |εm-|πplace numbers in two's complement
4394 notation may require |ε2m|4α+↓|41 |πplaces: the
4400 square of |→α_↓|εb|gm |πis |εb|g2|gm.|'{A12}|π!|9|4|1|1|1Let
4405 us now turn to an analysis of the quantity |εK
4416 |πthat arises in Program A, i.e., the number
4424 of carries that occur when |εn-|πplace numbers
4431 are being added together. This quantity |εK |πplays
4439 no part in the total running time of Program
4448 A, but it does a=ect the running time of the
4458 counterpartzs of Program A that deal with complement
4466 notations, and its analysis is interesting in
4473 itself as a signi_cant application of generating
4480 functions.|'!|9|4|1|1|1Suppose now that |εu |πand
4486 |εv |πare independent random |εn-|πplace integers
4492 uniformly distributed in the range 0|4|¬E|4|εu,
4498 v|4|¬W|4b|gn. |πLet |εp|βn|βk |πbe the probability
4504 that exactly |εk |πcarries occur in the addition
4512 of |εu |πto |εv, and |πthat one of these carries
4522 occurred in the most signi_cant position (so
4529 that |εu|4α+↓|4v|4|¬R|4b|gn). |πSimilarly, let
4533 |εq|βn|βk |πbe the probability that exactly |εk
4540 |πcarries occur, but there is no carry in the
4549 most signi_cant position. Then it is not hard
4557 to see that|'|ε{A9}p|β0|βk|4α=↓|40,!!q|β0k|4α=↓|4|≤d|β0|βk,!
4560 !|πfor|4all|4|εk;|;{A4}p|β(|βn|βα+↓|β1|β)|β(|βk|βα+↓|β1|β)|4
4561 |∂α=↓|4|(b|4α+↓|41|d22b|)|4p|βn|βk|4α+↓|4|(b|4α_↓|41|d22b|)|
4561 4q|βn|βk,|J!(3)|;{A4}| q|β(|βn|βα+↓|β1|β)|βk|4|Lα=↓|4|(b|4α_
4562 ↓|41|d22b|)|4p|βn|βk|4α+↓|4|(b|4α+↓|41|d22b|)|4q|βn|βk;>
4563 {A9}|πthis happens because (|εb|4α_↓|41)/2b |πis
4568 the probability that |εu|β1|4α+↓|4v|β1|4|¬R|4b
4572 |πand |ε(b|4α+↓|41)/2b |πis the probability that
4578 |εu|β1|4α+↓|4v|β1|4α+↓|41|4|¬R|4b, |πwhen |εu|β1
4581 |πand |εv|β1 |πare independently and uniformly
4587 distributed integers in the range 0|4|¬E|4|εu|β1,|4v|β1|4|¬W
4592 |4b.|'|π!|9|4|1|1|1To obtain further information
4597 about these quantities |εp|βn|βk |πand |εq|βn|βk,
4603 |πwe may set up the generating functions|'{A9}|εP(z,|4t)|4α=
4610 ↓|4|↔k|uc|)k,n|)|1|1p|βn|βkz|gkt|gn,!!Q(z,|4t)|4α=↓|4|↔k|uc|
4610 )k,n|)|1|1q|βn|βkz|gkt|gn;|J!(4)|;{A9}|πfrom
4612 (3) we have the basic relations|'{A9}|ε|h|εQ(z,|4t)|4|∂α=↓|4
4618 1|4α+↓|4t|4|↔ab|4α_↓|41|4P(z,|4t)|4α+↓|4b|4α+↓|41|4Q(z,|4t)|
4618 ↔s.|E|n|;| P(z,|4t)|4|Lα=↓|4zt|4|↔a|(b|4α+↓|41|d22b|)|4P(z,|
4619 4t)|4α+↓|4|(b|4α_↓|41|d22b|)|4Q(z,|4t)|↔s,>{A4}| Q(z,|4t)|4|
4620 Lα=↓|41|4α+↓|4t|4|↔a|(b|4α_↓|41|d22b|)|4P(z,|4t)|4α+↓|4|(b|4
4620 α+↓|41|d22b|)|4Q(z,|4t)|↔s.>{A9}|πThese two equations
4624 are readily solved for |εP(z,|4t); |πand if we
4632 let|'{A9}|εG(z,|4t)|4α=↓|4P(z,|4t)|4α+↓|4Q(z,|4t)|4α=↓|4|↔k|
4633 uc|)n|)|1|1G|βn(z)t|gn,|;{A9}|πwhere |εG|βn(z)
4636 |πis the generating function for the total number
4644 of carries when |εn-|πplace numbers are added,
4651 we _nd that|'{A9}|εG(z,|4t)|4α=↓|4(b|4α_↓|4zt)/p(z,|4t),!|πw
4654 here!|εp(z,|4t)|4α=↓|4b|4α_↓|4|f1|d32|)(1|4α+↓|4b)(1|4α+↓|4z
4654 )t|4α+↓|4zt|g2.|J!(5)|;{A9}|πNote that |εG(1,|4t)|4α=↓|41/(1
4657 |4α_↓|4t), |πand this checks with the fact that
4665 |εG|βn(1) |πmust equal 1 (it is the sum of all
4675 the possible probabilities). Taking partial derivatives
4681 of (5) with respect to |εz, |πwe _nd that|'{A9}|ε|h|ε|9|g2G|
4690 4|∂α=↓|4|↔kG|¬C(z)t|gn|4α=↓|4|→α_↓t|g2(b|4α+↓|41|4α_↓|42t)|4
4690 α+↓|4t|g2(b|4α_↓|4zt)(b|4α+↓|41|4α_↓|42t).|E|n|;
4691 | |(|9G|d2|9z|)|4|Lα=↓|4|↔k|uc|)n|)G|ur|↔0|)n|)(z)t|gn|4α=↓|
4691 4|(|→α_↓t|d2p(z,|4t)|)|4α+↓|4|(t(b|4α_↓|4zt)({U0}{H9L11M29}|
folio 357 galley 5
4691 πW58320#Computer Programming!(Knuth/Addision-Wesley)!f.357!C
4692 h.4!g.5b.|'{A20}{H10L12M29}Now let us put |εz|4α=↓|41
4698 |πand expand in partial fractions:|'{A9}|ε|↔k|uc|)n|)G|ur|↔0
4703 |)n|)(1)t|gn|4|∂α=↓|4|(t|d22|)|4|↔a|(1|d2(1|4α_↓|4t)|g2|)|4α
4703 _↓|4|(1|d2(b|4α_↓|41)(1|4α_↓|4t)|)|4α+↓|4|(1|d2(b|4α_↓|41)(b
4703 |4α_↓|4t)|),|'{A4}| |↔k|uc|)n|)G|ur|¬C|)n|)(1)t|gn|4|Lα=↓|4|
4704 (t|g2|d22|)|4|↔a|(1|d2(1|4α_↓|4t)|g3|)|4α_↓|4|(1|d2(b|4α_↓|4
4704 1)|g2(1|4α_↓|4t)|)|4α+↓|4|(1|d2(b|4α_↓|41)|g2(b|4α_↓|4t)|)>
4705 {A4}α+↓|4|(1|d2(b|4α_↓|41)(b|4α_↓|4t)|g2|)|↔s.|?
4706 {A9}|πIt follows that the average number of carries,
4714 i.e., the mean value of |εK, |πis|'{A9}|εG|ur|↔0|)n|)(1)|4α=
4721 ↓|4|(1|d22|)|4{H12}|↔a{H10}n|4α_↓|4|(1|d2b|4α_↓|41|)|4|↔a1|4
4721 α_↓|4|↔a|(1|d2b|)|↔s|gn|↔s{H12}|↔s{H10};|J!(6)|;
4722 {A9}|πthe variance is|'{A9}|εG|ur|¬C|)n|)(1)|4α+↓|4G|ur|↔0|)
4725 n|)(1)|4α_↓|4G|ur|↔0|)n|)(1)|g2|'{A4}α=↓|4|(1|d24|)|4{H12}|↔
4726 a{H10}n|4α+↓|4|(2n|d2b|4α_↓|41|)|4α_↓|4|(2b|4α+↓|41|d2(b|4α_
4726 ↓|41)|g2|)|4α+↓|4|(2b|4α+↓|42|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b
4726 |)|↔s|gn|4α_↓|4|(1|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b|)|↔s|gn|4α
4726 _↓|4|(1|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b|)|↔s|g2|gn{H12}|↔s{H1
4726 0}.!(7)|?{A9}|πSo the number of carries is just
4734 slightly less than |f1|d32|)|εn |πunder these
4740 assumptions.|'{A12}{H10L12}|≡H|≡i|≡s|≡t|≡o|≡r|≡y
4742 |≡a|≡n|≡d |≡B|≡i|≡b|≡l|≡i|≡o|≡g|≡r|≡a|≡p|≡h|≡y|≡.|9|4The
4744 early history of the classical algorithms described
4751 in this section is left as an interesting project
4760 for the reader, and only the history of their
4769 implementation on computers will be traced here.|'
4776 !|9|4|1|1|1The use of 10|ε|gn |πas an assumed
4783 radix when multiplying large numbers on a desk
4791 calculator was discussed by D. N. Lehmer and
4799 J. P. Ballantine, |εAMN |π|≡3|≡0 (1923), 67<69.|'
4806 !|9|4|1|1|1Double-precision arithmetic on computers
4810 was _rst treated by J. von Neumann and H. H.
4820 Goldstine [J. von Neumann, |εCollected Works
4826 |≡5|≡, 142<151]. |πTheorems A and B above are
4834 due to D. A. Pope and M. L. Stein [|εCACM |≡3
4845 (1960), 652<654]; |πtheir article also contains
4851 a bibliography of earlier work on double precision
4859 routines. Other ways of choosing the trial quotient
4867 |ε|=7q |πhave been discussed by A. G. Cox and
4876 H. A. Luther, |εCACM |≡4 (1961), 353 [|πdivide
4884 by |εv|β1|4α+↓|41 |πinstead of |εv|β1], |πand
4890 by M. L. Stein, |εCACM |≡7 (1964), 472<474 [|πdivide
4899 by |εv|β1 |πor |εv|β1|4α+↓|41 |πaccording to
4905 the magnitude of |εv|β2]; |πKrishnamurthy [|εCACM
4911 |≡8 (1965), 179<181] |πshowed that examination
4917 of the single-precision remainder in the latter
4924 method leads to an improvement over Theorem B.
4932 Krishnamurthy and Nadi, |εCACM |≡1|≡0 (1967),
4938 809<813, |πsuggested a way to replace normalization
4945 and unnormalization operations of Algorithm D
4951 by a calculation of |ε|=7q |πbased on several
4959 leading digits of the operands.|'!|9|4|1|1|1Several
4965 other methods for division have been suggested:|'
4972 !|9|4|1|1|1(1) ``Fourier division'' [J. Fourier,
4977 |εAnalyse des |=1equations d|=1etermin|=1ees
4981 (|πParis, 1831), Sec. 2.21]. This method, which
4988 was often used on desk calculators, essentially
4995 obtains each new quotient digit by increasing
5002 the precision of the divisor and the dividend
5010 at each step. Some rather extensive tests by
5018 the author have indicated that this method is
5026 certainly inferior to the ``divide and correct''
5033 technique above, but there may be some applications
5041 in which Fourier division is practical. See D.
5049 H. Lehmer, |εAMM |≡3|≡3 |π(1926), 198<206; J.
5056 V. Uspensky, |εTheory of Equations (|πNew York:
5063 McGraw-Hill, 1948), 159<164.|'!|9|4|1|1|1(2)
5067 ``Newton's method'' for evaluating the reciprocal
5073 of a number was extensively used in early computers
5082 when there was no single-precision division instruction.
5089 The idea is to _nd some initial approximation
5097 |εx|β0 |πto the number 1/|εv, |πthen to let |εx|βn|βα+↓|β1|4
5105 α=↓|42x|βn|4α_↓|4vx|ur2|)n|). |πThis method converges
5109 rapidly to 1/|εv, |πsince |εx|βn|4α=↓|4(1|4α_↓|4|≤e)/v
5114 |πimplies that |εx|βn|βα+↓|β1|4α=↓|4(1|4α_↓|4|≤e|g2)/v.
5117 |πConvergence to third order, i.e., with |ε|≤e
5124 |πreplaced by |εO(|≤e|g3) |πat each step, can
5131 be obtained using the formula|'{A9}|ε|h|εx|βn|βα+↓|β1|4|∂α=↓
5136 |4x|βn(1|4α+↓|4(1|4α_↓|4vx|βn)(1|4α+↓|4(1|4α_↓|4vx|βn))),|E|
5136 n|;| x|βn|βα+↓|β1|4|Lα=↓|4x|βn|4α+↓|4x|βn(1|4α_↓|4vx|βn)|4α+
5137 ↓|4x|βn(1|4α_↓|4vx|βn)|g2>{A4}|L|4α=↓|4x|βn{H12}({H10}1|4α+↓
5138 |4(1|4α_↓|4vx|βn)(1|4α+↓|4(1|4α_↓|4vx|βn)){H12}){H10},>
5139 {A9}|π{H10L12}etc.; see P. Rabinowitz, |εCACM
5144 |≡4 (1961), 98. |πFor calculations on extremely
5151 large numbers, Newton's second-order method (followed
5157 by multiplication by |εu) |πcan actually be considerably
5165 faster than Algorithm D, if we increase the precision
5174 of |εx|βn |πat each step and if we also use the
5185 fast multiplication routines of Section 4.3.3.
5191 (See Algorithm 4.3.3D for details.) Some related
5198 iterative schemes have been discussed by E. V.
5206 Krishnamurthy, |εIEEE Trans. |π|≡C|≡-|≡1|≡9 (1970),
5211 227<231.|'!|9|4|1|1|1(3) Division methods have
5216 also been based on the evaluation of|'{A9}|ε|(u|d2v|4α+↓|4|≤
5223 e|)|4α=↓|4|(u|d2v|)|4{H12}|↔a{H10}1|4α_↓|4|↔a|(|≤e|d2v|)|↔s|
5223 4α+↓|4|↔a|(|≤e|d2v|)|↔s|g2|4α_↓|4|↔a|(|≤e|d2v|)|↔s|g3|4α+↓|4
5223 |¬O|4|¬O|4|¬O{H12}|↔s{H10}.|;{A9}|πSee H. H.
5227 Laughlin, |εAMM |≡3|≡7 (1930), 287<293. |πWe
5233 have used this idea in the double-precision case
5241 (Eq. 4.2.3<3).|'{A12}{H10L12}!|9|4|1|1|1Besides
5244 the references just cited, the following early
5251 articles concerning multiple-precision arithmetic
5255 are of interest: High-precision ⊗oating-point
5260 routines using ones' complement arithmetic are
5266 described by A. H. Stroud and D. Secrest, |εComp.
5275 J. |≡6 (1963), 62<66. |πExtended-precision subroutines
5281 for use in FORTRAN programs are described by
5289 B. I. Blum, |εCACM |≡8 (1965), 318<320; |πand
5297 for use in ALGOL by M. Tienari and V. Suokonautio,
5307 |εBIT |≡6 (1966), 332<338. |πArithmetic on integers
5314 with |εunlimited |πprecision, making use of linked
5321 memory allocation techniques, has been elegantly
5327 described by G. E. Collins, |εCACM |≡9 (1966),
5335 578<589. |πFor a much larger repertoire of operations,
5343 including logarithms and trigonometric functions,
5348 see R. W. Brent, |εACM Trans. Math. Software
5356 |π(to appear).|'!|9|4|1|1|1We have restricted
5361 our discussion in this section to arithmetic
5368 techniques for use in computer programming. There
5375 are many algorithms for |εhardware |πimplementation
5381 of arithmetic operations which are very interesting
5388 but which appear to be inapplicable to computer
5396 programs for high-precision numbers; for example,
5402 see G. W. Reitwiesner, ``Binary Arithmetic,''
5408 |εAdvances in Computers |≡1 (|πNew York: Academic
5415 Press, 1960), 231<308; O. L. MacSorley, |εProc.
5422 IRE |≡4|≡9 (1961), 67<91; |πG. Metz, |εIRE Transactions
5430 |π|≡E|≡C|≡-|≡1|≡1 (1962), 76<764; H. L. Garner,
5436 ``Number Systems and Arithmetic,'' |εAdvances
5441 in Computers |≡6 (|πNew York: Academic Press,
5448 1965), 131<194. The minimum achievable execution
5454 time for hardware addition and multiplication
5460 operations has been investigated by S. Winograd,
5467 |εJACM |≡1|≡2 (1965), 277<285; |≡1|≡4 (1967),
5473 793<802, |πand by R. W. Floyd, |εIEEE Symp. Foundations
5482 Comp. |πby R. Brent, |εIEEE Trans. |π|≡C|≡-|≡1|≡9
5489 (1970), 758<759, |εSci. |≡1|≡6 (1975), 3<5.|'
5495 {A24}|π|∨E|∨X|∨E|∨R|∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|9|1|≡1|≡
5496 .|9|4[|ε|*/|↔M|↔P|\] |πStudy the early history
5501 of the classical algorithms for arithmetic, by
5508 looking up the writings of, say, Sun Ts|=|≠2u,
5516 al-Khow|=7arizm|=7i, Fibonacci, and Robert Recorde,
5521 and by translating their methods as faithfully
5528 as possible into more precise algorithmic notation.|'
5535 {A3}|9|1|≡2|≡.|9|4[|ε|*/|↔O|↔C|\] |πGeneralize
5537 Algorithm A so that it does ``column addition,''
5545 i.e., obtains the sum of |εm |πnonnegative |εn-|πplace
5553 integers. (Assume that |εm|4|¬E|4b.)|'{A3}|π|9|1|≡3|≡.|9|4[|
5557 ε|*/|↔P|↔O|\] |πWrite a |¬m|¬i|¬x program for
5563 the algorithm of exercise 2, and estimate its
5571 running time as a function of |εm |πand |εn.|'
5580 {A3}|π|9|1|≡4|≡.|9|4[|ε|*/M|↔P|↔O|\] |πGive a
5583 formal proof of the validity of Algorithm A,
5591 using the method of ``inductive assertions''
5597 as explained in Section 1.2.1.|'{A3}|9|1|≡5|≡.|9|4[|ε|*/|↔P|↔
5602 O|\] |πAlgorithm A adds the two inputs by going
5611 from right to left, but sometimes the data is
5620 more readily accessible from left to right. Design
5628 an algorithm which produces the same answer as
5636 Algorithm A, but which generates the digits of
5644 the answer from left to right, and goes back
5653 to change previous values if a carry occurs to
5662 make a previous value incorrect. (|εNote|*/: |\|πEarly
5669 Hindu and Arabic manuscripts were based on addition
5677 from left to right in this way; the right-to-left
5686 addition algorithm was a re_nement due to later
5694 Arabic writers, perhaps because Arabic is written
5701 from right to left.)|'{A3}|9|1|≡6|≡.|9|4[|ε|*/|↔P|↔P|\]
5706 |πDesign an algorithm which adds from left to
5714 right (as in exercise 5), but which does not
5723 store a digit of the answer until this digit
5732 cannot possibly be a=ected by future carries;
5739 there is to be no changing of any answer digit
5749 once it has been stored. [|εHint|*/: |\|πKeep
5756 track of the number of consecutive (|εb|4α_↓|41)'|πs
5763 which have not yet been stored in the answer.]
5772 This sort of algorithm would be appropriate,
5779 for example, in a situation where the input and
5788 output numbers are to be read and written from
5797 left to riht on magnetic tapes.|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P
5803 |↔o|\] |πDetermine the average number of times
5810 the algorithm of exercise 5 will _nd that a carry
5820 makes it necessary to go back and change |εk
5829 |πdigits of the partial answer, for |εk|4α=↓|41,2,|4.|4.|4.|
5835 4,|4n. (|πAssume that both inputs are independently
5842 and uniformly distributed between 0 and |εb|gn|4α_↓|41.)|'
5849 {A3}|π|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔o|\] |πWrite a
5852 |¬m|¬i|¬x program for the algorithm of exercise
5859 5, and determine its average running time based
5867 on the expected number of carries as computed
5875 in the text.|'{A3}|9|1|≡9|≡.|9|4[|ε|*/|↔P|↔O|\]
5879 |πGeneralize Algorithm A to obtain an algorithm
5886 which adds two |εn-|πplace numbers in a |εmixed
5894 radix |πnumber system, with bases |εb|β0,|4b|β1,|4.|4.|4.|4(
5899 |πfrom right to left). Thus the least signi_cant
5907 digits lie between 0 and |εb|β0|4α_↓|41, |πthe
5914 next digits lie between 0 and |εb|β1|4α_↓|41,
5921 |πetc.; cf. Eq. 4.1<(9).|'|Hβ*?*?{U0}{H9L11M29}|πW58320#Comput
folio 360 galley 6
5925 er Programming!(Knuth/Addision-Wesley)!f.360!Ch.4!G.6b.|'
5927 {A20}{H9L11M29}|≡1|≡0|≡.|9|4[|ε|*/|↔O|↔l|\] |πWould
5929 program S work properly if the instructions on
5937 lines 06 and 07 were interchanged? If the instructions
5946 on lines 05 and 06 were interchanged?|'{A3}|≡1|≡1|≡.|9|4[|ε|
5953 */|↔O|↔c|\] |πDesign an algorithm which compares
5959 two nonnegative |εn-|πplace integers |εu|4α=↓|4u|β1u|β2|4.|4
5963 .|4.|4u|βn |πand |εv|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn
5966 |πwith radix |εb, |πto determine whether |εu|4|¬W|4v,
5973 u|4α=↓|4v, |πor |εu|4|¬Q|4v.|'{A3}|π|≡1|≡2|≡.|9|4[|ε|*/|↔O|↔o
5976 |\] |πAlgorithm S assumes that we know which
5984 of the two input operands is the larger; if this
5994 information is not known, we could go ahead and
6003 perform the subtraction anyway, and we would
6010 _nd that an extra ``borrow'' is still present
6018 at the end of the algorithm. Design another algorithm
6027 which could be used (if there is a ``borrow''
6036 present at the end of Algorithm S) to complement
6045 |εw|β1w|β2|4.|4.|4.|4w|βn |πand therefore to
6049 abtain the absolute value of the di=erence of
6057 |εu |πand |εv.|'{A3}|π|≡1|≡3|≡.|9|4[|ε|*/|↔P|↔O|\]
6061 |πWrite a |¬m|¬i|¬x program which multiplies
6067 (|εu|β1u|β2|4.|4.|4.|4u|βn)|βb |πby |εv, |πwhere
6071 |εv |πis a single-precision number (i.e., 0|4|¬E|4|εv|4|¬W|4
6077 b), |πproducing the answer (|εw|β0w|β1|4.|4.|4.|4w|βn)|βb.
6082 |πHow much running time is required?|'{A3}|π|≡1|≡4|≡.|9|4[|ε
6088 |*/M|↔P|↔M|\] |πGive a formal proof of the validity
6096 of Algorithm M, using the method of ``inductive
6104 assertions'' as explained in Section 1.2.1.|'
6110 {A3}|≡1|≡5|≡.|9|4[|εM|*/|↔P|↔c|\] |πIf we wish
6114 to form the product of two |εn-|πplace fractions,
6122 (|ε.u|β1u|β2|4.|4.|4.|4u|βn)|βb|4α⊗↓|4(.v|β1v|β2|4.|4.|4.|4v
6122 |βn)|βb, |πand to obtain only an |εn-|πplace
6129 approximation |ε(.w|β1w|β2|4.|4.|4.|4w|βn)|βb
6131 |πto the result, Algorithm M could be used to
6140 obtain a 2|εn-|πplace answer which is then rounded
6148 to the desired approximation. But this involves
6155 about twice as much work as is necessary for
6164 reasonable accuracy, since the products |εu|βiv|βj
6170 |πfor |εi|4α+↓|4j|4|¬Q|4n|4α+↓|42 |πcontribute
6173 very little to the answer.|'!!|1|1Give an estimate
6181 of the maximum error that can occur, if these
6190 products |εu|βiv|βj |πfor |εi|4α+↓|4j|4|¬Q|4n|4α+↓|42
6194 |πare not computed during the multiplication,
6200 but are assumed to be zero.|'{A3} |≡1|≡6|≡.|9|4[|ε|*/|↔P|↔c|\
6207 ] |πDesign an algorithm which divides a nonnegative
6215 |εn-|πplace integer |εu|β1u|β2|4.|4.|4.|4u|βn
6218 |πby |εv, |πwhere |εv |πis a single precision
6226 number (i.e., 0|4|¬W|4|εv|4|¬W|4b), |πproducing
6230 the quotient |εw|β1w|β2|4.|4.|4.|4w|βn |πand
6234 remainder |εr.|'{A3}|π|≡1|≡7|≡.|9|4[|ε|*/M|↔P|↔c|\]
6237 |πIn the notation of Fig. 6, assume that |εv|β1|4|¬R|4|"lb/2
6245 |"L; |πshow that if |εu|β0|4α=↓|4v|β1, |πwe must
6252 have |εq|4α=↓|4b|4α_↓|41 |πor |εb|4α_↓|42.|'{A3}|π|≡1|≡8|≡.|
6256 9|4[|εM|*/|↔P|↔c|\] |πIn the notation of Fig.
6262 6, show that if |εq|¬S|4α=↓|4|"l(u|β0b|4α+↓|4u|β1)/(v|β1|4α+
6266 ↓|41)|"L, |πthen |εq|¬S|4|¬E|4q.|'{A3}|π|≡1|≡9|≡.|9|4[|εM|*/|
6269 ↔P|↔O|\] |πIn the notation of Fig. 6, let |ε|=7q
6278 |πbe an approximation to |εq, |πand let |ε|=7r|4α=↓|4u|β0b|4
6285 α+↓|4u|β1|4α_↓|4|=7qv|β1. |πAssume that |εv|β1|4|¬Q|40.
6289 |πShow that if |εv|β2|=7q|4|¬Q|4b|=7r|4α+↓|4u|β2,
6293 |πthen |εq|4|¬W|4|=7q. [Hint|*/: |\|πStrengthen
6297 the proof of Theorem A by examining the in⊗uence
6306 of |εv|β2.]|'{A3}|π|≡2|≡0|≡.|9|4[|εM|*/|↔P|↔P|\]
6309 |πUsing the notation and assumptions of exercise
6316 19, show that if |εv|β2|=7q|4|¬E|4b|=7r|4α+↓|4u|β2,
6321 |πthen |ε|=7q|4α=↓|4q |πor |εq|4α=↓|4|=7q|4α_↓|41.|'
6325 {A3}|π|≡2|≡1|≡.|9|4[|εM|*/|↔P|↔L|\] |πShow that
6328 if |εv|β1|4|¬R|4|"lb/2|"L, |πand if |εv|β2|=7q|4|¬E|4b|=7r|4
6332 α+↓|4u|β2 |πbut |ε|=7q|4|=|↔6α=↓|4q |πin the
6337 notation of exercises 19 and 20, then |εu|4α_↓|4qv|4|¬R|4(1|
6344 4α_↓|43/b)v. (|πThe latter event occurs with
6350 approximate probability 3/|εb, |πso that when
6356 |εb |πis the word size of a computer we must
6366 have |εq|βj|4α=↓|4|=7q |πin Algorithm D except
6372 in very rare circumstances.)|'{A3}|≡2|≡2|≡.|9|4[|ε|*/|↔P|↔M|\
6376 ] |πFind an example of a four-digit number divided
6385 by a three-digit number, using Algorithm D when
6393 the radix |εb |πis 10, for which step D6 is necessary.|'
6404 {A3}|≡2|≡3|≡.|9|4[|εM|*/|↔P|↔L|\] |πGiven that
6407 |εv |πand |εb |πare integers, and that 1|4|¬E|4|εv|4|¬W|4b,
6415 |πprove that |ε|"lb/2|"L|4|¬E|4v|"lb/(v|4α+↓|41)|"L|4|¬W|4(v
6417 |4α+↓|41)|"lb/(v|4α+↓|41)|"L|4|¬E|4b.|'{A3}|π|≡2|≡4|≡.|9|4[|
6418 εM|*/|↔P|↔c|\] |πUsing the law of the distribution
6425 of leading digits explained in Section 4.2.4,
6432 give an approximate formula for the probability
6439 that |εd|4α=↓|41 |πin Algorithm D. (When |εd|4α=↓|41,
6446 |πit is, of course, possible to omit most of
6455 the calculation in steps D1 and D8.)|'{A3}|π|≡2|≡5|≡.|9|4[|ε
6462 |*/|↔P|↔o|\] |πWrite a |¬m|¬i|¬x routine for step
6469 D1, which is needed to complete Program D.|'{A3}|≡2|≡6|≡.|9|
6477 4[|ε|*/|↔P|↔O|\] |πWrite a |¬m|¬i|¬x routine for
6483 step D8, which is needed to complete Program
6491 D.|'{A3}|≡2|≡7|≡.|9|4[|εM|*/|↔P|↔c|\] |πProve
6494 that at the beginning of step D8 in Algorithm
6503 D, the number |εu|βm|βα+↓|β1u|βm|βα+↓|β2|4.|4.|4.|4u|βm|βα+↓
6506 |βn |πis always an exact multiple of |εd.|'{A3}|π|≡2|≡8|≡.|9
6514 |4[|εM|*/|↔L|↔c|\] |π(A. Svoboda, |εStroje na
6519 Zpracov|=1an|=1i Informac|=1i |≡9 (1963), 25<32.)
6524 |πLet |εv|4α=↓|4(v|β1v|β2|4.|4.|4.|4v|βn)|βb
6526 |πbe any radix |εb |πinteger, where |εv|β1|4|=|↔6α=↓|40.
6533 |πPerforem the following operations:|'{A3}{I1.6H}|≡N|≡1|≡.|9
6537 If |εv|β1|4|¬W|4b/2, |πmultiply |εv |πby |"l(|εb|4α+↓|41)/(v
6542 |β1|4α+↓|41)|¬L. |πLet the result of this step
6549 be |ε(v|β0v|β1v|β2|4.|4.|4.|4v|βn)|βb.|'{A3}|π|≡N|≡2|≡.|9If
6552 |εv|β0|4α=↓|40, |πset |εv|4|¬L|4v|4α+↓|4(1/b)|"lb(b|4α_↓|4v|
6554 β1)/(v|β1|4α+↓|41)|"Lv; |πlet the result of this
6560 step be (|εv|β0v|β1v|β2|4.|4.|4.|4v|βn.v|βn|βα+↓|β1|4.|4.|4.
6562 )|βb. |πRepeat step N2 until |εv|β0|4|=|↔6α=↓|40.|'
6568 {A3}|π{IC}Prove that step N2 will be performed
6575 at most three times, and that we must always
6584 have |εv|β0|4α=↓|41, v|β1|4α=↓|40 |πat the end
6590 of the calculations.|'!!|1|1[|εNote|*/: |\|πIf
6595 |εu |πand |εv |πare both multiplied by the above
6604 constants, we do not change the value of the
6613 quotient |εu/v, |πand the divisor has been converted
6621 into the form (10|εv|β2|4.|4.|4.|4v|βn.v|βn|βα+↓|β1v|βn|βα+↓
6624 |β2v|βn|βα+↓|β3)|βb. |πThis form of the divisor
6630 may be very convenient because, in the notation
6638 of Algorithm D, we may simply take |ε|=7q|4α=↓|4u|βj
6646 |πas a trial divisor at the beginning of step
6655 D3, or |ε|=7q|4α=↓|4b|4α_↓|41 |πwhen |ε(u|βj|βα_↓|β1,|4u|βj)
6659 |4α=↓|4(1,|40).]|'{A3}|π|≡2|≡9|≡.|9|4[|ε|*/|↔O|↔C|\]
6661 |πProve or disprove: At the beginning of step
6669 D7 of Algorithm D, we always have |εu|βj|4α=↓|40.|'
6677 {A3}|π|≡3|≡0|≡.|9|4[|ε|*/|↔P|↔P|\] |πIf memory
6680 space is limited, it may be desirable to use
6689 the same storage locations for both input and
6697 output during the performance of some of the
6705 algorithms in this section. Is it possible to
6713 have |εw|β1,|4.|4.|4.|4,|4w|βn |πstored in the
6718 same respective locations as |εu|β1,|4.|4.|4.|4u|βn
6723 |πor |εv|β1,|4.|4.|4.|4, v|βn |πduring Algorithm
6728 A or S? Is it possible to have |εq|β0,|4.|4.|4.|4,|4q|βm
6737 |πoccupy the same locations as |εu|β0,|4.|4.|4.|4,|4u|βm
6743 |πin Algorithm D? Is there any permissible overlap
6751 of memory locations between input and output
6758 in Algorithm M?|'{A3}|≡3|≡1|≡.|9|4[|ε|*/|↔P|↔l|\]
6762 |πAssume that |εb|4α=↓|43 |πand that |εu|4α=↓|4(u|β1|4.|4.|4
6767 .|4u|βm|βα+↓|βn)|β3, v|4α=↓|4(v|β1|4.|4.|4.|4v|βn)|β3
6769 |πare integers in |εbalanced ternary |πnotation
6775 (cf. Section 4.1), |εv|β1|4|=|↔6α=↓|40. |πDesign
6780 a long-division algorithm which divides |εu |πby
6787 |εv, |πobtaining a remainder whose absolute value
6794 does not exceed |f1|d32|)|4|¬G|εv|¬G. |πTry to
6800 _nd an algorithm which would be e∃cient if incorporated
6809 into the arithmetic circuitry of a balanced ternary
6817 computer.|'{A3}|≡3|≡2|≡.|9|4[|εM|*/|↔M|↔c|\] |πAssume
6820 that |εb|4α=↓|42i |πand that |εu |πand |εv |πare
6828 complex numbers expressed in the quarter-imaginary
6834 number system. Design algorithms which divide
6840 |εu |πby |εv, |πperhaps obtaining a suitable
6847 remainder of some sort, and compare their e∃ciency.
6855 |εReferences|*/: |\|πM. Nadler, |εCACM |≡4 (1961),
6861 192<193; |πZ. Pawlak and A. Wakulicz, |εBull.
6868 de l'Acad. Polonaise des Sciences, |πClasse III,
6875 |≡5 (1957), 233<236 (see also pp. 803<804); and
6883 exercise 4.1<15.|'{A3}|π|≡3|≡3|≡.|9|4[|εM|*/|↔M|↔c|\]
6886 |πDesign an algorithm for taking square roots,
6893 analogous to Algorithm D and to the pencil-and-paper
6901 method for extracting square roots.|'{A3}|≡3|≡4|≡.|9|4[|ε|*/|
6906 ↔M|↔c|\] |πDevelop a set of computer subroutines
6913 for doing the four arithmetic operations on ajrbbbb*?*?*?'{A3}|
6920 ≡3|≡4|≡.|9|4[|ε|*/|↔M|↔c|\] |πDevelop a set of
6925 computer subroutines for doing the four arithmetic
6932 operations on arbitrary integers, putting no
6938 constraint on the size of the integers except
6946 for the implicit assumption that the total memory
6954 capacity of the computer should not be exceeded.
6962 (Use linked memory allocation, so that no time
6970 is wasted in _nding room to put the results.)|'
6979 {A3}|≡3|≡5|≡.|9|4[|ε|*/|↔M|↔c|\] |πDevelop a set
6983 of computer subroutines for ``decuple-precision
6988 ⊗oating-point'' arithmetic, using excess 0, base
6994 |εb, |πnine-place ⊗oating-point number representation,
6999 where |εb |πis the computer word size, and allowing
7008 a full word for the exponent. (Thus each ⊗oating-point
7017 number is represented in 10 words of memory,
7025 and all scaling is done by moving full words
7034 instead of shifting within the words.)|'{A3}|≡3|≡6|≡.|9|4[|ε
7040 M|*/|↔M|↔P|\] |πCompute the values of the fundamental
7047 constants listed in Appendix B to much higher
7055 precision than the 40-place values listed there.
7062 (|εNote|*/: |π|\The _rst 100,000 digits of the
7069 decimal expansion of |ε|≤p |πwere published by
7076 D. Shanks and J. W. Wrench, Jr., in |εMath. Comp.
7086 |≡1|≡6 (1962), 76<99.)|'{A18}{H10L12M29}|π|∨α/↓|∨4|∨.|∨3|∨.|
7089 ∨2|∨.|9|∨M|∨o|∨d|∨u|∨l|∨a|∨r |∨A|∨r|∨i|∨t|∨h|∨m|∨e|∨t|∨i|∨c|
7090 '{A6}{H10L12M29}Another interesting alternative
7094 is available for doing arithmetic on large integer
7102 numbers, based on some simple principles of number
7110 theory. The idea is to have several ``moduli''
7118 |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr |πwhich contain
7121 no common factors, and to work indirectly with
7129 ``residues'' |εu |πmod |εm|β1, u |πmod |εm|β2|4.|4.|4.|4,|4u
7135 |πmod |εm|βr |πinstead of directly with the
7143 number |εu.|'!|9|4|1|1|1|πFor convenience in
7148 notation throughout this section, let|'{A9}|εu|β1|4α=↓|4u|4|
7153 πmod|4|εm|β1,!!u|β2|4α=↓|4u|4|πmod|4|εm|β2,!!.|4.|4.|4,!!u|β
7153 r|4α=↓|4u|4|πmod|4|εm|βr.|J!(1)|;{A9}|πIt is
7156 easy to compute |ε(u|β1,|4u|β2,|4.|4.|4.|4,|4u|βr)
7160 |πfrom an integer number |εu |πby means of division;
7169 and#more important#no information is lost in
7175 this process, since we can always recompute |εu
7183 |πfrom (|εu|β1,|4u|β2,|4.|4.|4.|4,|4u|βr) |πprovided
7186 that we know |εu |πis not too large. For example,
7196 if 0|4|¬E|4|εu|4|¬W|4v|4|¬E|41000, |πit is impossible
7201 to have (|εu|4|πmod 7, |εu |πmod 11, |εu |πmod
7210 13) equal to (|εv |πmod 7, |εv |πmod 11, |εv
7220 |πmod 13). This is a consequence of the ``Chinese
7229 Remainder Theorem'' stated below.|'!|9|4|1|1|1Therefore
7234 we may regard (|εu|β1,|4u|β2,|4.|4.|4.|4,|4u|βr)
7238 |πas a new type of internal computer representation,
7246 a ``modular representation,'' of the integer
7252 |εu.|'|π!|9|4|1|1|1The advantages of a modular
7258 representation are that addition, subtraction,
7263 and multiplication are very simple:|'{A9}|ε(u|β1,|4.|4.|4.|4
7268 ,|4u|βr)|4α+↓|4(v|β1,|4.|4.|4.|4v|βr)|4α=↓|4{H12}({H10}(u|β1
7268 |4α+↓|4v|β1)|πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α+↓|4v|βr)|πm
7268 od|4|εm|βr{H12}){H10},|J!(2)|;{A4}(u|β1,|4.|4.|4.|4,|4u|βr)|
7269 4α_↓|4(v|β1,|4.|4.|4.|4,|4v|βr)|4α=↓|4{H12}({H10}(u|β1|4α_↓|
7269 4v|β1)|πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α⊗↓|4v|βr)|πmod|4|ε
7269 m|βr{H12}){H10},|J!(3)|;{A4}(u|β1,|4.|4.|4.|4,|4u|βr)|4α⊗↓|4
7270 (v|β1,|4.|4.|4.|4,|4v|βr)|4α=↓|4{H12}({H10}(u|β1|4α⊗↓|4v|β1)
7270 |πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α⊗↓|4v|βr)|πmod|4|εm|βr{H
7270 12}){H10}.|J!(4)|;{A9}|πIt is easy to prove these
7277 formulas; for example, to prove (4) we must show
7286 that |εuv |πmod |εm|βj|4α=↓|4(u|4|πmod|4|εm|βj)(v|4|πmod|4|ε
7289 m|βj)|πmod|4|εm|βj |πfor each modulus |εm|βj.
7294 |πBut this is a basic fact of elementary number
7303 theory: |εx |πmod |εm|βj|4α=↓|4y |πmod |εm|βj
7309 |πif and only if |εx|4|"o|4y (|πmodulo |εm|βj);
7316 |πfurthermore if |εx|4|"o|4x|¬S |πand |εy|4|"o|4y|¬S,
7321 |πthen |εxy|4|"o|4x|¬Sy|¬S (|πmodulo |εm|βj);
7325 |πhence (|εu |πmod |εm|βj)(v |πmod |ε{U0}{H9L11M29}|πW58320#
folio 363 galley 7
7330 Computer Programming!(Knuth/Addision-Wesley)!f.363!Ch.4!g.7b
7331 .|'{A20}{H10L12M29}!|9|4|1|1|1The disadvantages
7334 of a modular representation are that it is comparatively
7343 di∃cult to test whether a number is positive
7351 or negative or to test whether or not (|εu|β1,|4.|4.|4.|4,|4
7359 u|βr) |πis greater than (|εv|β1,|4.|4.|4.|4v|βr).
7364 |πIt is also di∃cult to test whether or not over⊗ow
7374 has occurred as the result of an addition, subtraction,
7383 or multiplication, and it is even more di∃cult
7391 to perform division. When these operations are
7398 required frequently in conjunction with addition,
7404 subtraction, and multiplication, the use of modular
7411 arithmetic can be justi_ed only if fast means
7419 of conversion into and out of the modular representation
7428 are available. Therefore conversion between modular
7434 and positional notation is one of the principal
7442 topics of interest to us in this section.|'!|9|4|1|1|1The
7451 processes of addition, subtraction, and multiplication
7457 using (2), (3), and (4) are called residue arithmetic
7466 or |εmodular arithmetic. |πThe range of numbers
7473 that can be handled by modular arithmetic is
7481 equal to |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr, |πthe
7485 product of the moduli. Therefore we see that
7493 the amount of time required to add, subtract,
7501 or multiply |εn-|πdigit numbers using modular
7507 arithmetic is essentially proportional to |εn
7513 (|πnot counting the time to convert in and out
7522 of modular representation). This is no advantage
7529 at all when addition and subtraction are considered,
7537 but it can be a considerable advantage with respect
7546 to multiplication since the conventional method
7552 of the preceding section requires an execution
7559 time proportional to |εn|g2.|'|π!|9|4|1|1|1Moreover,
7564 on a computer which allows many operations to
7572 take place simultaneously, modular arithmetic
7577 can be a signi_cant advantage even for addition
7585 and subtraction; the operations with respect
7591 to di=erent moduli can all be done at the ssme
7601 time, so we obtain a substantial increase in
7609 speed. The same kind of decrease in execution
7617 time could not be achieved by the conventional
7625 techniques discussed in the previous section,
7631 since carry propagation must be considered. Perhaps
7638 some day highly parallel computers will make
7645 simultaneous operations commonplace, so that
7650 modular arithmetic will be of signi_cant importance
7657 in ``real-time'' calculations when a quick answer
7664 to a single problem requiring high precision
7671 is needed. (With highly parallel computers, it
7678 is often preferable to run |εk separate |πprograms
7686 simultaneously, instead of running a |εsingle
7692 |πprogram |εk |πtimes as fast, since the latter
7700 alternative is more complicated but does not
7707 utilize the machine any more e∃ciently; ``real-time''
7714 calculations are exceptions which make the inherent
7721 parallelism of modular arithmetic more signi_cant.)|'
7727 !|9|4|1|1|1Now let us examine the basic fact
7734 which underlies the modular representation of
7740 numbers:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡C (|εChinese
7744 Remainder Theorem).|9|4Let m|β1, m|β2,|4.|4.|4.|4,|4m|βr
7748 be positive integers which are relatively prime
7755 in pairs, i.e.,|'{A9}|πgcd(|εm|βj,|4m|βk)|4α=↓|41!!|πwhen!!|
7758 εj|4|=|↔6α=↓|4k.|J!(5)|;{A9}Let m|4α=↓|4m|β1m|β2|4.|4.|4.|4m
7760 |βr, and let a, u|β1, u|β2,|4.|4.|4.|4,|4u|βr
7766 be integers. Then there is exactly one integer
7774 u which satis⊂es the conditions|'{A9}|εa|4|¬E|4u|4|¬W|4a|4α+
7779 ↓|4m,!!|πand!!|εu|4|"o|4u|βj!(|πmodulo|4|εm|βj)!!|πfor!!1|4|
7779 ¬E|4|εj|4|¬E|4r.|J!(6)|;{A9}|π|ε|εProof.|9|4|πIf
7781 |εu|4|"o|4v (|πmodulo|4|εm|βj) |πfor |ε1|4|¬E|4j|4|¬E|4r,
7785 |πthen |εu|4α_↓|4v |πis a multiple of |εm|βj
7792 |πfor all |εj, |πso (5) implies that |εu|4α_↓|4v
7800 |πis a multiple of |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr.
7805 |πThis argument shows that there is |εat most
7813 |πone solution of (6). To complete the proof
7821 we must only show the existence of |εat least
7830 |πone solution, and this can be done in two simple
7840 ways:|'{A12}METHOD 1 (``Nonconstructive'' proof).|9|4As
7845 |εu |πruns through the |εm |πdistinct values
7852 |εa|4|¬E|4u|4|¬W|4a|4α+↓|4m, |πthe |εr-|πtuples
7855 (|εu|4|πmod|4|εm|β1,|4.|4.|4.|4,|4u |πmod |εm|βr)
7858 |πmust also run through |εm |πdistinct values,
7865 since (6) has at most one solution. But there
7874 are exactly |εm|β1m|β2|4.|4.|4.|4m|βr |πpossible
7878 |εr-|πtuples (|εv|β1,|4.|4.|4.|4,|4v|βr) |πsuch
7881 that 0|4|¬E|4|εv|βj|4|¬W|4m|βj. |πTherefore each
7885 |εr-|πtuple must occur exactly once, and there
7892 must be some value of |εu |πfor which (|εu|4|πmod
7901 |εm|β1,|4.|4.|4.|4,|4u |πmod |εm|βr)|4α=↓|4(u|β1,|4.|4.|4.|4
7903 ,|4u|βr).|'{A12}|πMETHOD 2 (``Consyructive''
7907 proof).|9|4We can _nd numbers |εM|βj, 1|4|¬E|4j|4|¬E|4r,
7913 |πsuch that|'{A9}|εM|βj|4|"o|41|4(|πmodulo|4|εm|βj|4|"o|40|4
7915 (|πmodulo|4|εm|βk)!!|πfor!!|εk|4|=|↔6α=↓|4j.|J!(7)|;
7916 {A9}|πThis follows because (5) implies that |εm|βj
7923 |πand |εm/m|βj |πare relatively prime, so we
7930 may take|'{A9}|εM|βj|4α=↓|4(m/m|βj)|g|≤'|g(|gm|rj|g)|J!(8)|;
7933 {A9}|πby Euler's theorem (exercise 1.2.4<28).
7938 Now the number|'{A9}|εu|4α=↓|4a|4α+↓|4{H12}({H10}(u|β1M|β1|4
7941 α+↓|4u|β2M|β2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βrM|βr|4α_↓|4a)|πm
7941 od|4|εm{H12}){H10}|J!(9)|;{A9}|πsatis_es all
7944 the conditions of (6).|'{A12}!|9|4|1|1|1A very
7950 special case of this theorem was stated by the
7959 Chinese mathematician Sun-Ts|=|≠2u, who gave
7964 a rule called t|=1ai-yen (``great generalization'');
7970 the date of his writing is very uncertain, it
7979 is thought to be between 280 and 473 {H7}A.D.{H10}
7988 [See Joseph Needham, |εScience and Civilization
7994 in China |≡3 (|πCambridge University Press, 1959),
8001 33<34, for an interesting discussion.] Theorem
8007 C was apparently _rst stated and proved in its
8016 proper generality by Chhin Chiu-Shao in his |εShu
8024 Shu Chiu Chang (1247). |πNumerous early contributions
8031 to this theory have been summarized by L. E.
8040 Dickson in his |εHistory of the Theory of Numbers
8049 |≡2 (|πNew York: Chelsea, 1952), 57<64.|'!|9|4|1|1|1As
8056 a consequence of Theorem C, we may use modular
8065 representation for numbers in any consecutive
8071 interval of |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr
8074 |πintegers. For example, we could take |εa|4α=↓|40
8081 |πin (6), and work only with nonnegative integers
8089 |εu |πless than |εm. |πOn the other hand, when
8098 addition and subtraction are being done, as well
8106 as multiplication, it is usually most convenient
8113 to assume that all the moduli |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|
8119 βr |πare odd numbers, so that |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m
8125 |βr |πis odd, and to work with integers in the
8135 range|'{A9}|εα_↓|4|(m|d22|)|4|¬W|4u|4|¬W|4|(m|d22|),|J!(10)|
8136 ;{A9}|πwhich is completely symmetrical about
8142 zero.|'!|9|4|1|1|1To perform the basic operations
8148 indicated in (2), (3), and (4), we need to compute
8158 |ε(u|βj|4α+↓|4v|βj)|πmod |εm|βj, (u|βj|4α_↓|4v|βj)|πmod
8161 |εm|βj, |πand |εu|βjv|βj |πmod |εm|βj, |πwhen
8167 0|4|¬E|4|εu|βj,|4v|βj|4|¬W|4m|βj. |πIf |εm|βj
8170 |πis a single-precision number, it is most convenient
8178 to form |εu|βjv|βj |πmod |εm|βj |πby doing a
8186 multiplication and then a division operation.
8192 For addition and subtraction, the situation is
8199 a little simpler, since no division is necessary;
8207 the following formulas may conveniently be used;|'
8214 {A9}|ε(u|βj|4α+↓|4v|βj)|πmod|4|εm|βj|4|∂α=↓|4|↔A|(u|βj|4α+↓|
8214 4v|βj,!!|9|4|1|1|d5u|βj|4α+↓|4v|βj|4α_↓|4m|βj,|)!!|π|(if!!|ε
8214 u|βj|4α+↓|4v|βj|4|¬W|4m|βj;|d5|πif!!|εu|βj|4α+↓|4v|βj|4α+↓|4
8214 v|βj|4|¬R|4m|βj.|)|J!(11)|;{A4}| (u|βj|4α_↓|4v|βj)|πmod|4|εm
8215 |βj|4|Lα=↓|4|↔A|(u|βj|4α_↓|4v|βj,!!|9|4|1|1|d5u|βj|4α_↓|4v|β
8215 j|4α+↓|4m|βj,|)!!|π|(if!!|εu|βj|4α_↓|4v|βj|4|¬R|40;|d5|πif!!
8215 |εu|βj|4α_↓|4v|βj|4|¬W|40.|)|J!(12)>{A9}|π{H10L12}(Cf.
8217 Section 3.2.1.1.) In this case, since we want
8225 |εm |πto be as large as possible, it is easiest
8235 to let |εm|β1 |πbe the largest odd number that
8244 _ts in a computer word, to let |εm|β2 |πbe the
8254 largest odd number|4|¬W|4|εm|β1 |πthat is relatively
8260 prime to |εm|β1, |πto let |εm|β3 |πbe the largest
8269 odd number|4|¬W|4|εm|β2 |πthat is relatively
8274 prime to both |εm|β1 |πand |εm|β2, |πand so on
8283 until enough |εm|βj'|πs have been found to give
8291 the desired range |εm. |πE∃cient ways to determine
8299 whether or not two integers are relatively prime
8307 are discussed in Section 4.5.2.|'!|9|4|1|1|1As
8313 a simple example, suppose that we have a decimal
8322 computer with a word size of only 100. Then the
8332 procedure described in the previous paragraph
8338 would give|'{A9}|εm|β1|4α=↓|499,!m|β2|4α=↓|497,!m|β3|4α=↓|49
8340 5,!m|β4|4α=↓|491,!m|β5|4α=↓|489,!m|β6|4α=↓|483,|J!(13)|;
8341 {A9}|πand so on.|'!|9|4|1|1|1On binary computers
8347 it is sometimes desirable to choose the |εm|βj
8355 |πin a di=erent way, by selecting|'{A9}|εm|βj|4α=↓|42|ge|rj|
8361 4α_↓|41.|J!(14)|;{A9}|πIn other words, each modulus
8367 is one less than a power of 2. Such a choice
8378 of |εm|βj |πoften makes the basic arithmetic
8385 operations simpler, because it is relatively
8391 easy to work modulo 2|ε|ge|rj|4α_↓|41, |πas in
8398 ones' complement arithmetic. When the moduli
8404 are chosen according to this strategy, it is
8412 helpful to relax the condition |ε0|4|¬E|4u|βj|4|¬W|4m|βj
8418 |πslightly, so that we require only|'{A9}|ε0|4|¬E|4u|βj|4|¬W
8424 |42|ge|rj,!!u|βj|4|"o|4u{U0}{H9L11M29}|πW58320#Computer
folio 366 galley 8
8425 programming!(Knuth/Addision-Wesley)!f.366!Ch.4.!G.8b.|'
8426 {A20}{H10L12M29}|πThus, the value |εu|βj|4α=↓|4m|βj|4α=↓|42|
8429 ge|rj|4α_↓|41 |πis allowed as an optional alternative
8436 to |εu|βj|4α=↓|40, |πsince this does not a=ect
8443 the validity of Theorem C, and it means we are
8453 allowing |εu|βj |πto be any |εe|βj-|πbit binary
8460 number. Under this assumption, the operations
8466 of addition and multiplication modulo |εm|βj
8472 |πbecome the following:|'{A9}|εu|βj|4|↔V|4v|βj|4|∂α=↓|4|↔A|(
8475 u|βj|4α+↓|4v|βj,!!!!|d5{H12}({H10}(u|βj|4α+↓|4v|βj)|πmod|42|
8475 ε|ge|rj{H12}){H10}|4α+↓|41,|)!!|π|(if!!|εu|βj|4α+↓|4v|βj|4|¬
8475 W|42|ge|rj;|d5|πif!!|εu|βj|4α+↓|4v|βj|4|¬R|42|ge|rj.|)|J!(16
8475 )|;{A4}| u|βj|4|↔N|4v|βj|4|Lα=↓|4(u|βjv|βj|4|πmod|42|ε|ge|rj
8476 )|4|↔V|4|"lu|βjv|βj/2|ge|rj|"L.|J!(17)>{A9}|π[Here
8478 |↔V and |↔N refer to the operations to be done
8488 on the individual components of |ε(u|β1,|4.|4.|4.|4,|4u|βr)
8494 |πand |ε(v|β1,|4.|4.|4.|4,|4v|βr) |πwhen adding
8498 or multiplying, respectively, using the convention
8504 (15).] Equation (12) may be used for subtraction.
8512 Clearly, these operations can be readily performed
8519 even when |εm|βj |πis larger than the computer's
8527 word size; it is a simple matter to compute the
8537 remainder of a positive number modulo a power
8545 of 2, or to divide a number by a power of 2.
8557 In (17) we have the sum of the ``upper half''
8567 and the ``lower half'' of the product, as discussed
8576 in exercise 3.2.1.1<8.|'!|9|4|1|1|1If moduli
8581 of the form 2|ε|ge|rj|4α_↓|41 |πare to be used,
8589 we must know under what conditions the number
8597 |ε2|ge|4α_↓|41 |πis relatively prime to the number
8604 2|ε|gf|4α_↓|41. |πFortunately, there is a very
8610 simple rule,|'{A9}|ε|πgcd(2|ε|ge|4α_↓|41,|42|gf|4α_↓|41)|4α=
8612 ↓|42|π|gg|gc|gd|g(|ε|ge|g,|gf|g)|4α_↓|41,|J!(18)|;
8613 {A9}|πwhich states in particular that 2|ε|ge|4α_↓|41
8619 and 2|gf|4α_↓|41 are relatively prime if and
8626 only if e and f are relatively prime. |πEquation
8635 (18) follows from Euclid's algorithm and the
8642 identity|'{A8}|ε(2|ge|4α_↓|41)|πmod(2|ε|gf|4α_↓|41)|4α=↓|42|
8643 ge|π|1|1|gm|go|gd|1|1|ε|gf|4α_↓|41.|J!(19)|;{A9}|π(See
8645 exercise 6.) Thus we could choose for example
8653 |εm|β1|4α=↓|42|g3|g5|4α_↓|41, m|β2|4α=↓|42|g3|g4|4α_↓|41,
8655 m|β3|4α=↓|42|g3|g3|4α_↓|41, m|β4|4α=↓|42|g3|g1|4α_↓|41,
8657 m|β5|4α=↓|42|g2|g9|4α_↓|41, |πif we had a computer
8663 with word size 2|g3|g5 and if we wanted to represent
8673 numbers *?(See exercise 6.) Thus we could choose
8681 for example |εm|β1|4α=↓|42|g3|g5|4α_↓|41, m|β2|4α=↓|42|g3|g4
8684 |4α_↓|41, m|β3|4α=↓|42|g3|g3|4α_↓|41, m|β4|4α=↓|42|g3|g1|4α_
8686 ↓|41, m|β5|4α=↓|42|g2|g9|4α_↓|41, |πif we had
8691 a computer with word size 2|g3|g5 and if we wanted
8701 to represent numbers up to |εm|β1m|β2m|β3m|β4m|β5|4|¬Q|42|g1
8706 |g6|g1. |πThis range of integers is not big enough
8715 to make modular arithmetic faster than the conventional
8723 method, and we usually _nd that modular arithmetic
8731 using convention (15) is advantageous only when
8738 the |εm|βj |πare larger than the word size or
8747 when division is inconvenient.|'!|9|4|1|1|1As
8752 we have already observed, the operations of conversion
8760 to and from modular representation are very important.
8768 If we are given a number |εu, |πits modular representation
8778 (|εu|β1,|4.|4.|4.|4,|4u|βr) |πmay be obtained
8782 by dividing |εu |πby |εm|β1,|4.|4.|4.|4,|4m|βr
8787 |πand saving the remainders. A possibly more
8794 attractive procedure, if |εu|4α=↓|4(v|βmv|βm|βα_↓|β1|4.|4.|4
8797 .|4v|β0)|βb, |πis to evaluate the polynomial|'
8803 {A9}|ε(.|4.|4.|4(v|βmb|4α+↓|4v|βm|βα_↓|β1)b|4α+↓|4.|4.|4.)b|
8803 4α+↓|4v|β0|;{A9}|πusing modular arithmetic. When
8808 |εb|4α=↓|42 |πand when the modulus |εm|βj |πhas
8815 the special form 2|ε|g2|rj|4α_↓|41, |πboth of
8821 these methods reduce to quite a simple procedure:|'
8829 Consider the binary representation of |εu |πwith
8836 blocks of |εe|βj |πbits grouped together,|'{A9}|εu|4α=↓|4a|β
8842 tA|gt|4α+↓|4a|βt|βα_↓|β1A|gt|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α↓
8842 |4a|β1A|4α+↓|4a|β0,|J!(20)|;{A9}|πwhere |εA|4α=↓|42|ge|rj
8845 |πand 0|4|¬E|4|εa|βk|4|¬W|42|ge|rj |πfor |ε0|4|¬E|4k|4|¬E|4t
8848 . |πThen|'{A9}|εu|4|"o|4a|βt|4α+↓|4a|βt|βα_↓|β1|4α+↓|4|¬O|4|
8850 ¬O|4|¬O|4α+↓|4a|β1|4α+↓|4a|β0!(|πmodulo|42|ε|ge|rj|4α_↓|41),
8850 |J!(21)|;{A9}|πsince |εA|4|"o|41. |πTherefore
8854 we may obtain |εu|βj |πby adding the |εe|βj-|πbit
8862 numbers |εa|βt|4|↔V|4|¬O|4|¬O|4|¬O|4|↔V|4a|β1|4|↔V|4a|β0,
8864 |πmodulo 2|ε|ge|rj|4α_↓|41, |πas in Eq. (16).
8870 This process is similar to the familiar device
8878 of ``casting out nines'' which is used to determine
8887 |εu |πmod 9 when |εu |πis expressed in the decimal
8897 system.|'!|9|4|1|1|1Conversion back from modular
8902 form to positional notation is somewhat more
8909 di∃cult. It is interesting in this regard to
8917 make a few side remarks about the way computers
8926 make us change our viewpoint towards mathematical
8933 proofs: Theorem C tells us that the conversion
8941 from (|εu|β1,|4.|4.|4.|4,|4u|βr) |πto |εu |πis
8946 possible, and two proofs are given. The _rst
8954 proof we considered is a classical one which
8962 makes use only of very simple concepts, namely
8970 the facts that|'{A12}{I1.2H}|4|1i)|9any number
8975 which is a multiple of |εm|β1 |πand of |εm|β2,|4.|4.|4.|4,|4
8983 |πand of |εm|βr, |πmust be a multiple of |εm|β1m|β2|4.|4.|4.
8991 |4m|βr |πwhen the |εm|βj'|πs are pairwise relatively
8998 prime; and|'ii)|9if |εm |πthings are put into
9006 |εm |πboxes with no two things in the same box,
9016 then there must be one in each box.|'{A12}{IC}By
9025 traditional notions of mathematical aesthetics,
9030 this is no doubt the nicest proof of Theorem
9039 C; but from a computational standpoint it is
9047 completely worthless*3 It amounts to saying, ``Try
9054 |εu|4α=↓|4a, a|4α+↓|41,|4.|4.|4. |πuntil you
9058 _nd a value for which |εu|4|"o|4u|β1 (|πmodulo|4|εm|β1),|4.|
9064 4.|4.|4,|4u|4|"o|4u|βr (|πmodulo|4|εm|βr).''|'
9066 |π!|9|4|1|1|1The second proof of Theorem C is
9073 more explicit; it shows how to compute |εr |πnew
9082 constants |εM|β1,|4.|4.|4.|4,|4M|βr, |πand to
9086 get the solution in terms of these constants
9094 by formula (9). This proof uses more complicated
9102 concepts (for example, Euler's theorem), but
9108 it is much more satisfactory from a computational
9116 standpoint, since the constants |εM|β1,|4.|4.|4.|4,|4M|βr
9121 |πneed to be determined only once. On the other
9130 hand, the determination of |εM|βj |πby Eq. (8)
9138 is certainly not trivial, since the evaluation
9145 of Euler's |ε|≤'-|πfunction requires, in general,
9151 the factorization of |εm|βj |πinto prime powers.
9158 Furthermore, |εM|βj |πis likely to be a terribly
9166 large number, even if we compute only the quantity
9175 |εM|βj |πmod |εm (|πwhich will work just as well
9184 as |εM|βj |πin (9)). Since |εM|βj |πmod |εm |πis
9193 uniquely determined if (7) is to be satis_ed
9201 (because of the Chinese Remainder Theorem*3),
9207 we can see that, in any event, Eq. (9) requires
9217 a lot of high-precision calculation, and such
9224 calculation is just what we wished to avoid by
9233 modular arithmetic in the _rst place.|'!|9|4|1|1|1So
9240 we need an even |εbetter |πproof of Theorem C
9249 if we are going to have a really usable method
9259 of conversion from (|εu|β1,|4.|4.|4.|4,|4u|βr)
9263 |πto |εu. |πSuch a method was suggested by H.
9272 L. Garner in 1958; it can be carried out using
9282 (|ε|urr|)2|)) |πconstants |εc|βi|βj |πfor 1|4|¬E|4|εi|4|¬W|4
9286 j|4|¬E|4r, |πwhere|'{A9}|εc|βi|βjm|βi|4|"o|41!(|πmodulo|4|εm
9288 |βj).|J!(22)|;{A9}|πThese constants |εc|βi|βj
9292 |πare readily computed using Euclid's algorithm,
9298 since Algorithm 4.5.2X determines |εa, b |πsuch
9305 that |εam|βi|4α+↓|4bm|βj|4α=↓|4|πgcd(|εm|βi,|4m|βj)|4α=↓|41
9307 |πand we may take |εc|βi|βj|4α=↓|4a. |πWhen the
9314 moduli have the special form e|ε|ge|rj|4α_↓|41,
9320 |πa simple method of determining |εc|βi|βj |πis
9327 given in exercise 6.|'|9|4|1|1|1Once the |εc|βi|βj
9334 |πhave been determined satisfying (22), we can
9341 set|'{A9}|ε!!!v|β1|4|¬L|4u|β1|4|πmod|4|εm|β1,|'
9343 {A4}!!!v|β2|4|¬L|4(u|β2|4α_↓|4v|β1)c|β1|β2|4|πmod|4|εm|β2,|'
9344 {A4}!!!v|β3|4|¬L|4{H12}({H10}(u|β3|4α_↓|4v|β1)c|β1|β3|4α_↓|4
9344 v|β2{H12}){H10}c|β2|β3|4|πmod|4|εm|β3,|J!(23)|'
9345 {A4}!!!|¬O|4|¬O|4|¬O|'{A4}!!!v|βr|4|¬L|4(.|4.|4.|4{H12}({H10
9346 }(u|βr|4α_↓|4v|β1)c|β1|βr|4α_↓|4v|β2{H12}){H10}c|β2|βr|4α_↓|
9346 4|¬O|4|¬O|4|¬O|4α_↓|4v|βr|βα_↓|β1{H12}){H10}c|ur|)(rα_↓1)r|)
9346 |4|πmod|4|εm|βr.|'{A6}|πThen|'{A6}|εu|4α=↓|4v|βrm|βr|βα_↓|β1
9348 |4.|4.|4.|4m|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4v|β3m|β2m|β1|4α+↓|
9348 4v|β2m|β1|4α+↓|4v|β1|J!(24)|;{A9}|πis a number
9352 satisfying the conditions|'{A9}|ε0|4|¬E|4u|4|¬W|4m,!!u|4|"o|
9355 4u|βj!(|πmodulo|4|εm|βj),!!1|4|¬E|4j|4|¬E|4r.|J!(25)|;
9356 {A9}|π(See exercise 8; another way of rewriting
9363 (23) which does not involve as many auxiliary
9371 constants is given in exercise 7.) Equation (24)
9379 is a |εmixed radix representation |πof |εu, |πwhich
9387 may be converted to binary or decimal notation
9395 using the methods of Section 4.4. If 0|4|¬E|4|εu|4|¬W|4m
9403 |πis not the desired range, an appropriate multiple
9411 of |εm |πcan be added or subtracted after the
9420 conversion process.|'!|9|4|1|1|1The advantage
9424 of the computation shown in (23) is that the
9433 calculation of |εv|βj |πcan be done using only
9441 arithmetic mod |εm|βj, |πwhich is already built
9448 into the modular arithmetic algorithms. Furthermore,
9454 (23) allows parallel computation: We can start
9461 with |ε(v|β1,|4.|4.|4.|4,|4v|βr)|4|¬L|4(u|β1|4|πmod|4|εm|β1,
9462 |4.|4.|4.|4,|4u|βr |πmod |εm|βr), |πthen at time
9468 |εj |πfor |ε1|4|¬E|4j|4|¬W|4r |πwe simultaneously
9473 set |εv|βk|4|¬L|4(v|βk|4α_↓|4v|βj)c|βj|βk |πmod
9476 |εm|βk |πfor |εj|4|¬W|4k|4|¬E|4r. |πAn alternative
9481 way to compute the mixed-radix representation,
9487 allowing similar possibilities for parallelism,
9492 has been discussed by A. S. Fraenkel, |εProc.
9500 ACM Nat. Conf. |≡1|≡9 (|ε|πPhiladelphia, 1965),
9506 E1.4.|'!|9|4|1|1|1It is important to observe
9512 that the mixed radix representation (24) is su∃cient
9520 to compare the magnitudes of two modular numbers.
9528 For if we know that |ε0|4|¬E|4u|4|¬W|4m |πand
9535 |ε0|4|¬E|4u|¬S|4|¬W|4m, |πthen we can tell if
9541 |εu|4|¬W|4u|¬S |πby _rst doing the conversion
9547 to |εv|β1,|4.|4.|4.|4,|4v|βr |πand |εv|ur|↔0|)1|),|4.|4.|4.|
9550 4,|4v|ur|↔0|)r|), |πthen testing if |εv|βr|4|¬W|4v|ur|↔0|)r|
9554 ), |πor if |εv|βr|4α=↓|4v|ur|↔0|)r|) |πand |εv|βr|βα_↓|β1|4|
9559 ¬W|4v|ur|↔0|)rα_↓1|), |πetc. It is not necessary
9565 to convert all the way to binary or decimal notation
9575 if we only want to know whether (|εu|β1,|4.|4.|4.|4,|4u|βr)
9583 |πis less than (|εu|ur|↔0|)1|),|4.|4.|4.|4,|4u|ur|↔0|)r|)).|
9586 '|π!|9|4|1|1|1The operation of comparing two
9592 numbers, or of deciding if a modular number is
9601 negative, is intuitively very simple, so we would
9609 expect to _nd a much easier method for making
9618 this test than the conversion to mixed radix
9626 form. But the following theorem shows that there
9634 is little hope of _nding a substantially easier
9642 method, since the range of a modular number depends
9651 essentia{U0}{H9L11M29}|πW58320#Computer programming!(Knuth/A
folio 370 galley 9
9652 ddision-Wesley)!f.370!Ch.4!G.9b.|'{A20}{H10L12M29}|≡T|≡h|≡e|
9653 ≡o|≡r|≡e|≡m |≡S|≡. (|εNicholas Szab|=1o, |*/|↔O|↔m|↔o|↔O|\).
9658 In terms of the notation above, assume that m|β1|4|¬W|4{H11}
9666 |¬H{H10}|v4m|), and let L be any value in the
9675 range|'{A9}m|β1|4|¬E|4L|4|¬E|4m|4α_↓|4m|β1.|J!(26)|;
9677 {A9}|εLet g be any function such that the set
9686 |¬Tg(0),|4g(1),|4.|4.|4.|4,|4g(m|β1|4α_↓|41)|¬Y
9687 contains less than m|β1 values. Then there are
9695 numbers u and v such that|'{A9}g(u|4|πmod|4|εm|β1)|4α=↓|4g(v
9701 |4|πmod|4|εm|β1),!!u|4|πmod|4|εm|βj|4α=↓|4v|4|πmod|4|εm|βj!!
9701 |πfor!!2|4|¬E|4|εj|4|¬E|4r;|J!(27)|;{A9}0|4|¬E|4u|4|¬W|4L|4|
9702 ¬E|4v|4|¬W|4m.|J!(28)|;{A9}|εProof.|9|4|πBy hypothesis,
9705 there must exist numbers |εu|4|=|↔6α=↓|4v |πsatisfying
9711 (27), since |εg |πmust take on the same value
9720 for two di=erent residues. Let |ε(u,|4v) |πbe
9727 a pair of values with 0|4|¬E|4|εu|4|¬W|4v|4|¬W|4m
9733 |πsatisfying (27), for which |εu |πis a minimum.
9741 Since |εu|¬S|4α=↓|4u|4α_↓|4m|β1 |πand |εv|¬S|4α=↓|4v|4α_↓|4m
9744 |β1 |πalso satisfy (27), we must have |εu|¬S|4|¬W|40
9752 |πby the minimality of |εu. |πHence |εu|4|¬W|4m|β1|4|¬E|4L;
9759 |πand if (28) does not hold, we must have |εv|4|¬W|4L.
9769 |πBut |εv|4|¬Q|4u, |πand |εv|4α_↓|4u |πis a multiple
9776 of |εm|β2|4.|4.|4.|4m|βr|4α=↓|4m/m|β1, |πso |εv|4|¬R|4v|4α_↓
9779 |4u|4|¬R|4m/m|β1|4|¬Q|4m|β1. |πTherefore, if
9782 (28) does not hold for |ε(u,|4v), |πit will be
9791 satis_ed for the pair (|εu|¬C,|4v|¬C)|4α=↓|4(v|4α_↓|4m|β1,|4
9795 u|4α+↓|4m|4α_↓|4m|β1).|'{A12}|π!|9|4|1|1|1Of
9797 course, a similar result can be proved for any
9806 |εm|βj |πin place of |εm|β1; |πand we could also
9815 replace (28) by the condition |ε``a|4|¬E|4u|4|¬W|4a|4α+↓|4L|
9820 4|¬E|4v|4|¬W|4a|4α+↓|4m'' |πwith only minor changes
9825 in the proof. Therefore Theorem S shows that
9833 many simple functions cannot be used to determine
9841 the range of a modular number.|'!|9|4|1|1|1Let
9848 us now reiterate the main points of the discussion
9857 in this section: Modular arithmetic can be a
9865 signi_cant advantage for applications in which
9871 the predominant calculations involve exact multiplication
9877 (or raising to a power) of large integers, combined
9886 with addition and subtraction, but where there
9893 is very little need to divide or compare numbers,
9902 |εor to test whether intermediate results ``over⊃ow''
9909 out of range. (|πIt is important not to forget
9918 the latter restriction; methods are available
9924 to test for over⊗ow, as in exercise 12, but they
9934 are in general so complicated that they nullify
9942 the advantages of modular arithmetic.) Several
9948 applications for modular computations have been
9954 discussed by H. Takahasi and Y. Ishibashi, |εInformation
9962 Processing in Japan |≡1 (1961), 28<42.|'|π!|9|4|1|1|1An
9969 example of such an application is the exact solution
9978 of linear equations with rational coe∃cients.
9984 For various reasons it is desirable in this case
9993 to assume that the moduli |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr
9999 |πare all large prime numbers; the linear equations
10007 can be solved independently modulo each |εm|βj.
10014 |πA detailed discussion of this procedure has
10021 been given by I. Borosh and A. S. Fraenkel [|εMath.
10031 Comp. |≡2|≡0 (1966), 107<112]. |πBy means of
10038 their method, the nine independent solutions
10044 of a system of 111 linear equations in 120 unknowns
10054 were obtained exactly in less than one hour's
10062 running time on a CDC 1604 computer. The same
10071 procedure is worth while also for solving simultaneous
10079 linear equations with ⊗oating-point coe∃cients,
10084 when the matrix of coe∃cients is ill-conditioned.
10091 The modular technique (treating the given ⊗oating-point
10098 coe∃cients as exact rational numbers) gives a
10105 method for obtaining the |εtrue |πanswers in
10112 less time than conventional methods can produce
10119 reliable |εapproximate |πanswers*3 [See M. T.
10125 McClellan, |εJACM |≡2|≡0 (1973), 563<588, |πfor
10131 further developments of this approach; and see
10138 also |ε|πE. H. Bareiss, |εJ. Inst. Math. and
10146 Appl. |≡1|≡0 (1972), 68<104 |πfor a discussion
10153 of its limitations.]|'!|9|4|1|1|1The published
10158 literature concerning modular arithmetic is mostly
10164 oriented towards hardware design, since the carry-free
10171 properties of modular arithmetic make it attractive
10178 from the standpoint of high-speed operation.
10184 The idea was _rst published by A. Svoboda and
10193 M. Valach in the Czechoslovakian journal |εStroje
10200 na Zpracov|=1an|=1i Informac|=1i |≡3 (1955),
10205 247<295; |πthen independently by H. L. Garner
10212 [|εIRE Transactions |π|≡E|≡C|≡-|≡8 (1959), 140<147].
10217 The use of moduli of the form |ε2|ge|rj|4α_↓|41
10225 |πwas suggested by A. S. Fraenkel [|εJACM |≡8
10233 (1961), 87<96], |πand several advantages of such
10240 moduli were demonstrated by A. Sch|=4onhage [|εComputing
10247 |≡1 (1966), 182<196]. |πSee the book |εResidue
10254 Arithmetic and its Applications to Computer Technology
10261 |πby N. S. Szab|=1o and R. I. Tanaka (New York:
10271 McGraw-Hill, 1967), for additional information
10276 and a comprehensive bibliography of the subject.|'
10283 !|9|4|1|1|1Further discussion of modular arithmetic
10288 can be found in part B of Section 4.3.3.|'{A24}|∨E|∨X|∨E|∨R|
10297 ∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|9|1|≡1|≡.|9|4[|ε|*/|↔P|↔c|\]
10299 |πFind all integer numbers |εu |πwhich satisfy
10306 the conditions |εu |πmod 7|4α=↓|41, |εu |πmod
10313 11|4α=↓|46, |εu |πmod 13|4α=↓|45, |πand 0|4|¬E|4|εu|4|¬W|410
10318 00.|'{A3}|9|1|≡2|≡.|9|4[|ε|*/M|↔P|↔c|\] |πWould
10321 Theorem C still be true if we allowed |εa, u|β1,
10331 u|β2,|4.|4.|4.|4,|4u|βr |πand |εu |πto be arbitrary
10337 real numbers (not just integers)?|'{A3}|9|1|≡3|≡.|9|4[|εM|*/|
10342 ↔P|↔o|\] (|εGeneralized Chinese Remainder Theorem.)
10347 |πLet |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr |πbe positive
10351 integers. Let |εm |πbe the least common multiple
10359 of |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr, |πand let
10363 |εa, u|β1, u|β2,|4.|4.|4.|4,|4u|βr |πbe any integers.
10369 Prove that there is exactly one integer |εu |πwhich
10378 satis_es the conditions|'{A9}|εa|4|¬E|4u|4|¬W|4a|4α+↓|4m,!!u
10381 |4|"o|4u|βj!(|πmodulo|4|εm|βj),!!1|4|¬E|4j|4|¬E|4r,|;
10382 {A9}|πprovided that|'{A9}|εu|βi|4|"o|4u|βj!(|πmodulo|4gcd(|ε
10384 m|βi,|4m|βj){H11}){H9},!!1|4|¬E|4|εi|4|¬W|4j|4|¬E|4r;|;
10385 {A9}|π{H9}and there is no such integer |εu |πwhen
10393 the latter condition fails to hold.|'{A3}|9|1|≡4|≡.|9|4[|ε|*/
10399 |↔P|↔c|\] |πContinue the process shown in (13);
10406 what would |εm|β7, m|β8, m|β9, m|β1|β0 |πbe?|'
10413 {A3}|9|1|≡5|≡.|9|4[|εM|*/|↔P|↔L|\] |πSuppose that
10416 the method of (13) is continued until no more
10425 |εm|βj |πcan be chosen; does this method give
10433 the largest attainable value |εm|β1m|β2|4.|4.|4.|4m|βr
10438 |πsuch that the |εm|βj |πare odd positive integers
10446 less than 100 which are relatively prime in pairs?|'
10455 {A3}|9|1|≡6|≡.|9|4[|εM|*/|↔P|↔P|\] |πLet |εe,
10458 f, g |πbe nonnegative integers. (a) Show that
10466 |ε2|ge|4|"o|42|gf (|πmodulo |ε2|gg|4α_↓|41) |πif
10470 and only if |εe|4|"o|4f (|πmodulo |εg). |π(b)
10477 Given that |εe |πmod |εf|4α=↓|4d |πand |εce |πmod
10485 |εf|4α=↓|41, |πprove that|'{A9}|ε{H10}({H9}(1|4α+↓|42|gd|4α+
10488 ↓|4|¬O|4|¬O|4|¬O|4α+↓|42|ur(cα_↓1)d|))|4|¬O|4(2|ge|4α_↓|41){
10488 H10}){H9}|πmod|4(2|ε|gf|4α_↓|41)|4α=↓|41.|;{A9}|π[Thus,
10490 we have a comparatively simple formula for the
10498 inverse of |ε2|ge|4α_↓|41, |πmodulo |ε2|gf|4α_↓|41,
10503 |πas required in (22).]|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P|↔O|\]
10508 |πShow that (23) can be rewritten as follows:|'
10516 {A9}|ε!v|β1|4|¬L|4u|β1|4|πmod|4|εm|β1,|'{A4}!v|β2|4|¬L|4(u|β
10517 2|4α_↓|4v|β1)c|β1|β2|4|πmod|4|εm|β2,|'{A4}!v|β3|4|¬L|4{H11}(
10518 {H9}u|β3|4α_↓|4(v|β1|4α+↓|4m|β1v|β2){H11}){H9}c|β1|β3c|β2|β3
10518 |4|πmod|4|εm|β3,|'!|¬O|4|¬O|4|¬O|'!v|βr|4|¬L|4{H11}({H9}u|βr
10520 |4α_↓|4(v|β1|4α+↓|4m|β1(v|β2|4α+↓|4m|β2(v|β3|4α+↓|4|¬O|4|¬O|
10520 4|¬O|4α+↓|4m|ur|)rα_↓2|)v|βr|βα_↓|β1)|4.|4.|4.)){H11}){H9}c|
10520 β1|βr|4.|4.|4.|4c|ur|)(rα_↓1)r|)|4|πmod|4|εm|βr.|'
10521 {A9}|π{H9L11}If the formulas are rewritten in
10527 this way, we see that only |εr|4α_↓|41 |πconstants
10535 |εC|βj|4α=↓|4c|β1|βj|4.|4.|4.|4c|β(|βj|βα_↓|β1|β)|βj
10536 |πmod |εm|βj |πare needed instead of |εr(r|4α_↓|41)/2
10543 |πconstants |εc|βi|βj |πas in (23). Discuss the
10550 relative merits of this version of the formula
10558 as compared to (23), from the standpoint of computer
10567 calculation.|'{A3}|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔O|\]
10569 |πProve that the number |εu |πde_ned by (23)
10577 and (24) satis_es (25).|'{A3}|9|1|≡9|≡.|9|4[|εM|*/|↔P|↔c|\]
10582 |πShow how to go from the values |εv|β1,|4.|4.|4.|4,|4v|βr
10590 |πof the mixed radix notation (24) back to the
10599 original residues |εu|β1,|4.|4.|4.|4,|4u|βr,
10602 |πusing only arithmetic mod |εm|βj |πto compute
10609 |εu|βj.|'{A3}|≡1|≡0|≡.|9|4[|εM|*/|↔P|↔C|\] |πAn
10612 integer |εu |πwhich lies in the symmetrical range
10620 (10) might be represented by _nding the numbers
10628 |εu|β1,|4.|4.|4.|4,|4u|βr |πsuch that |εu|4|"o|4u|βj
10632 (|πmodulo|4|εm|βj) |πand |→α_↓|εm|βj/2|4|¬W|4u|βj|4|¬W|4m|βj
10634 /2, |πinstead of insisting that 0|4|¬E|4|εu|βj|4|¬W|4m|βj
10640 |πas in the text. Discuss the modular arithmetic
10648 procedures that would be used in this case (including
10657 the conversion process, (23){H11}){H9}.|'{A3}|≡1|≡1|≡.|9|4[|
10661 εM|*/|↔P|↔L|\] |πAssume that all the |εm|βj |πare
10668 odd, and that |εu|4α=↓|4(u|β1,|4.|4.|4.|4,|4u|βr)
10672 |πis known to be even, where |ε0|4|¬E|4u|4|¬W|4m.
10679 |πFind a reasonably fast method to compute |εu/2
10687 |πusing modular arithmetic.|'{A3}|≡1|≡2|≡.|9|4[|εM|*/|↔O|↔c|\
10690 ] |πProve that, if 0|4|¬E|4|εu,|4v|4|¬W|4m, |πthe
10696 modular addition of |εu |πand |εv |πcauses over⊗ow
10704 (i.e., is outside the range allowed by the modular
10713 representation) if and only if the sum is less
10722 than |εu. (|πThus the over⊗ow detection problem
10729 is equivalent to the comparison problem.)|'{A3}|≡1|≡3|≡.|9|4
10735 [|εM|*/|↔P|↔C|\] (|εAutomorphic numbers.) |πAn
10739 |εn-|πplace decimal number |εx|4|¬Q|41 |πis called
10745 an ``automorph'' by recreational mathematicians
10750 if the last |εn |πdigits of |εx|g2 |πare equal
10759 to |εx; |πi.e., if |εx|g2 |πmod 10|ε|gn|4α=↓|4x.
10766 [|πSee |εScienti⊂c American |≡2|≡1|≡8 (|πJanuary,
10771 1968), 125.] For example, 9376 is a 4-place automorph,
10780 since 9376|g2|4α=↓|487909376.|'!!|1|1(a) Prove
10784 that an |εn-|πplace number |εx|4|¬Q|41 |πis an
10791 automorph if and only if |εx |πmod 5|ε|gn|4α=↓|40
10799 |πor 1, and |εx |πmod 2|ε|gn|4α=↓|41 |πor 0,
10807 respectively. [Thus, if |εm|β1|4α=↓|42|gn |πand
10812 |εm|β2|4α=↓|45|gn, |πthe only two |εn-|πplace
10817 automorphs are the numbers |εM|β1 |πand |εM|β2
10824 |πin (7).]|'!!|1|1(b) Prove that if |εx |πis
10832 an |εn-|πplace automorph, then (3|εx|g2|4α_↓|42x|g3)|πmod
10837 10|ε|g2|gn |πis a |ε2n-|πplace automorph.|'!!|1|1(c)
10843 Given that |εc|≤x|4|"o|41 (|πmodulo |εy), |πwhat
10849 is a simple formula for a number |εc|¬S |πsuch
10858 that |εc|¬S|≤x|g2|4|"o|41 (|πmodulo |εy|g2)?|'
folio 372 galley 10
10862 |H{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/Addisio
10863 n-Wesley)!F.372!Ch.4!G.10b.|'{A20}{H10L12M29}|π|∨α/↓|∨4|∨.|∨
10864 3|∨.|∨3|∨. |∨H|∨o|∨w |∨F|∨a|∨s|∨t |∨C|∨a|∨n |∨W|∨e
10869 |∨M|∨u|∨l|∨t|∨i|∨p|∨l|∨y|∨?|'{A6}The conventional
10872 method for multiplication, Algorithm 4.3.1M,
10877 requires approximately |εcmn |πoperations to
10882 multiply an |εm-|πdigit number by an |εn-|πdigit
10889 number, where |εc |πis a constant. In this section,
10898 let us assume for convenience that |εm|4α=↓|4n,
10905 |πand let us consider the following question:
10912 |εDoes every general computer algorithm for multiplying
10919 two n-digit numbers require an execution time
10926 proportional to n|g2, as n increases|*/?|\|'|π!|9|4|1|1|1(In
10933 this question, a ``general'' algorithm means
10939 one which accepts, as input, the number |εn |πand
10948 two arbitrary |εn-|πdigit numbers in positional
10954 notation, and whose output is their product in
10962 positional form. Certainly if we were allowed
10969 to choose a di=erent algorithm for each value
10977 of |εn, |πthe question would be of no interest,
10986 since multiplication could be done for any speci_c
10994 value of |εn |πby a ``table-lookup'' operation
11001 in some huge table. The term ``computer algorithm''
11009 is meant to imply an algorithm which is suitable
11018 for implementation on a digital computer such
11025 as |¬m|¬i|¬x, and the execution time is to be
11034 the time it takes to perform the algorithm on
11043 such a computer.)|'{A12}|≡A|≡. |≡D|≡i|≡g|≡i|≡t|≡a|≡l
11048 |≡m|≡e|≡t|≡h|≡o|≡d|≡s|≡.|9|4The answer to the
11052 above question is, rather surprisingly, ``No,''
11058 and, in fact, it is not very di∃cult to see why.
11069 For convenience, let us assume throughout this
11076 section that we are working with integers expressed
11084 in binary notation. If we have two 2|εn-|πbit
11092 numbers |εu|4α=↓|4(u|β2|βn|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2
11094 |πand |εv|4α=↓|4(v|β2|βn|βα_↓|β1|4.|4.|4.|4v|β1v|β0)|β2,
11096 |πwe can write|'{A9}|εu|4α=↓|42|gnU|β1|4α+↓|4U|β0,!!v|4α=↓|4
11099 2|gnV|β1|4α+↓|4V|β0,|J!(1)|;{A9}|πwhere |εU|β1|4α=↓|4(u|β2|β
11101 n|βα_↓|β1|4.|4.|4.|4u|βn)|β2 |πis the ``most-signi_cant
11105 half'' of |εu |πand |εU|β0|4α=↓|4(u|βn|βα_↓|β1|4.|4.|4.|4u|β
11109 0)|β2 |πis the ``least-signi_cant half''; and
11115 similarly |εV|β1|4α=↓|4(v|β2|βn|βα_↓|β1|4.|4.|4.|4v|βn)|β2,
11117 V|β0|4α=↓|4(v|βn|βα_↓|β1|4.|4.|4.|4v|β0)|β2.
11118 |πNow we have|'{A9}|εuv|4α=↓|4(2|g2|gn|4α+↓|42|gn)U|β1V|β1|4
11121 α+↓|42|gn(U|β1|4α_↓|4U|β0)(V|β0|4α_↓|4V|β1)|4α⊗↓|4(2|gn|4α+↓
11121 |41)U|β0V|β0.|J!(2)|;{A9}|πThis formula reduces
11125 the problem of multiplying |ε2n-|πbit numbers
11131 to three multiplications of |εn-|πbit numbers,
11137 |εU|β1V|β1, (U|β1|4α_↓|4U|β0)(V|β0|4α_↓|4V|β1),
11139 |πand |εU|β0V|β0, |πplus some simple shifting
11145 and adding operations.|'!|9|4|1|1|1Formula (2)
11150 can be used for double-precision multiplication
11156 when a quadruple precision result is desired,
11163 and it is just a little faster than the traditional
11173 method on many machines. It is more important
11181 to observe that we can use formula (2) to de_ne
11191 a recursive process for multiplication which
11197 is signi_cantly faster than the familiar order-|εn|g2
11204 |πmethod when |εn |πis large: If |εT(n) |πis
11212 the time required to perform multiplication of
11219 |εn-|πbit numbers, we have|'{A9}*?*?*?*?{A9}|εT(2n)|4|¬E|43T(n)|
11223 4α+↓|4cn|J!(3)|;{A9}|πfor some constant |εc,
11228 |πsince the right-hand side of (2) uses just
11236 three multiplications plus some additions and
11242 shifts. Relation (3) implies by induction that|'
11249 {A9}|εT(2|gk)|4|¬E|4c(3|gk|4α_↓|42|gk),!!k|4|¬R|41,|J!(4)|;
11250 {A9}|πif we choose |εc |πto be large enough so
11259 that this inequality is valid when |εk|4α=↓|41;
11266 |πand therefore we have|'{A9}|ε|h|εT(n)|4|¬E|4T(2|g|"p|π|gl|
11270 gg|1|1|ε|gn|g|"P)|4|∂|¬E|43c|4|¬O|43|π|gl|gg|1|1|ε|gn|4α=↓|4
11270 3cn|π|gl|gg|1|1|g3.|E|n|;| |εT(n)|4|¬E|4T(2|g|"p|π|gl|gg|1|1
11271 |ε|gn|g|"P)|4|¬E|4c(3|g|"p|π|gl|gg|1|1|ε|gn|g|"P|4α_↓|42|g|"
11271 p|π|gl|gg|1|1|ε|gn|"P)>{A4}|L|4|¬E|43c|4|¬O|43|π|gl|gg|1|1|ε
11272 |gn|4α=↓|43cn|π|gl|gg|1|1|ε|g3.|J!(5)>{A9}|πRelation
11274 (5) shows that the running time for multiplication
11282 can be reduced from order |εn|g2 |πto order |εn|π|gl|gg|1|1|
11290 g3|4|¬V|4|εn|g1|g.|g5|g8|g5, |πand of course
11294 this is a much faster algorithm when |εn |πis
11303 large.|'!|9|4|1|1|1(A similar but more complicated
11309 method for doing multiplication with running
11315 time of order |εn|π|gl|gg|1|1|g3 was apparently
11321 _rst suggested by A. Karatsuba and Yu. Ofman,
11329 |εDoklady Akad. Nauk SSSR |≡1|≡4|≡5 (1962), 293<294.
11336 |πCuriously, this idea does not seem to have
11344 been discovered before 1962; none of the ``calculating
11352 prodigies'' who have become famous for their
11359 ability to multiply large numbers mentally have
11366 been reported to use any such method, although
11374 formula (2) adapted to decimal notation would
11381 seem to lead to a reasonably easy way to multiply
11391 eight-digit numbers in one's head.)|'!|9|4|1|1|1The
11397 running time can be reduced still further, in
11405 the limit as |εn |πapproaches in_nity, if we
11413 observe that the method just used is essentially
11421 the special case |εr|4α=↓|41 |πof a more general
11429 method that yields|'{A9}|εT{H12}({H10}(r|4α+↓|41)n{H12}){H10
11432 }|4|¬E|4(2r|4α+↓|41)T(n)|4α+↓|4cn|J!(6)|;{A9}|π{H10L12}for
11434 any _xed |εr. |πThis more general method can
11442 be obtained as follows: Let|'{A9}|εu|4α=↓|4(u|β(|βr|βα+↓|β1|
11447 β)|βn|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2!!|πand!!|εv|4α=↓|4(v|β(
11447 |βr|βα+↓|β1|β)|βn|βα_↓|β1|4.|4.|4.|4v|β1v|β0)|β2|;
11448 {A9}|πbe broken into |εr|4α+↓|41 |πpieces,|'{A9}|εu|4α=↓|4U|
11453 βr2|gr|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4U|β12|gn|4α+↓|4U|β0,!!v|
11453 4α=↓|4V|βr2|gr|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4V|β12|gn|4α+↓|4V
11453 |β0,|J!(7)|;{A9}|πwhere each |εU|βj |πand each
11459 |εV|βj |πis an |εn-|πbit number. Consider the
11466 polynomials|'{A9}|εU(x)|4α=↓|4U|βrx|gr|4α+↓|4|¬O|4|¬O|4|¬O|4
11467 α+↓|4U|β1x|4α+↓|4U|β0,!!V(x)|4α=↓|4V|βrx|gr|4α+↓|4|¬O|4|¬O|4
11467 |¬O|4α+↓|4V|β1x|4α+↓|4V|β0,|J!(8)|;{A9}|πand
11469 let|'{A9}|εW(x)|4α=↓|4U(x)V(x)|4α=↓|4W|β2|βrx|g2|gr|4α+↓|4|¬
11470 O|4|¬O|4|¬O|4α+↓|4W|β1x|4α+↓|4W|β0.|J!(9)|;{A9}|πSince
11472 |εu|4α=↓|4U(2|gn) |πand |εv|4α=↓|4V(2|gn), |πwe
11476 have |εuv|4α=↓|4W(2|gn), |πso we can easily compute
11483 |εuv |πif we know the coe∃cients of |εW(x). |πThe
11492 problem is to _nd a good way to compute the coe∃cients
11503 of |εW(x) |πby using only |ε2r|4α+↓|41 |πmultiplications
11510 mxf *?*?*?*?and |εv|4α=↓|4V(2|gn), |πwe have |εuv|4α=↓|4W(2|gn)
11516 , |πso we can easily compute |εuv |πif we know
11526 the coe∃cients of |εW(x). |πThe problem is to
11534 _nd a good way to compute the coe∃cients of |εW(x)
11544 |πby using only |ε2r|4α+↓|41 |πmultiplications
11549 of |εn-|πbit numbers plus some further operations
11556 which involve only an execution time proportional
11563 to |εn. |πThis can be done by computing|'{A9}|εU(0)V(0)|4α=↓
11571 |4W(0),!!U(1)V(1)|4α=↓|4W(1),!!.|4.|4.|4,!!U(2r)V(2r)|4α=↓|4
11571 W(2r).|J!(10)|;{A9}|πThe coe∃cients of a polynomial
11577 of degree |ε2r |πcan be written as a linear combination
11587 of the values of that polynomial at |ε2r|4α+↓|41
11595 |πdistinct points; such a linear combination
11601 requires an execution time at most proportional
11608 to |εn. (|πActually, the products |εU(j)V(j)
11614 |πare not strictly products of |εn-|πbit numbers,
11621 but they are products of at most (|εn|4α+↓|4t)-|πbit
11629 numbers, where |εt |πis a _xed value depending
11637 on |εr. |πIt is easy to design a multiplication
11646 routi!|9|4|1|1|1Relation (6) can be used to show
11653 that |εT(n)|4|¬E|4c|β3n|π|gl|go|gg|ε|rr|rα+↓|r1|g(|g2|gr|gα+
11654 ↓|g1|g)|4|¬W|4c|β3n|g1|gα+↓|π|gl|go|gg|ε|rr|rα+↓|r1|g2,
11655 |πusing a method analogous to the derivation
11662 of (5), so we have now proved:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m
11670 |≡A|≡.|9|4|εGiven |≤e|4|¬Q|40, there exists a
11675 constant c(|≤e) and a multiplication algorithm
11681 such that the number of elementary operations
11688 T(n) needed to multiply two n-bit numbers satis⊂es|'
11696 {A9}T(n)|4|¬W|4c(|≤e)n|g1|gα+↓|g|≤e.|J!(11)|;
11697 {A9}|π!|9|4|1|1|1This theorem is still not the
11703 result we are after. It is unsatisfactory for
11711 practical purposes in that the method becomes
11718 much more complicated as |ε|≤e|4|¬M|40 (|πand
11724 therefore as |εr|4|¬M|4|¬X), |πcausing |εc(|≤e)
11729 |πto grow so rapidly that extremely huge values
11737 of |εn |πare needed before we have any signi_cant
11746 improvement over (5). And it is unsatisfactory
11753 for theoretical purposes because it does not
11760 make use of the full power of the polynomial
11769 method on which it is based. We can obtain a
11779 better result if we let |εr vary |πwith |εn,
11788 |πchoosing larger and larger values of |εr |πas
11796 |εn |πincreases. This idea is due to A. L. Toom
11806 [|εDoklady Akademiia Nauk SSSR |≡1|≡5|≡0 (1963),
11812 496<498; |πtr. into English in |εSoviet Mathematics
11819 |≡3 (1963), 714<716], |πwho used it to show that
11828 computer circuitry for multiplication of |εn-|πbit
11834 numbers can be constructed involving a fairly
11841 small number of components as |εn |πgrows. S.
11849 A. Cook [|εOn the minimum computation time of
11857 functions (|πThesis, Harvard University, 1966),
11862 51<77] later showed how Toom's method can be
11870 adapted to fast computer programs.|'!|9|4|1|1|1Before
11876 we discuss the Toom-Cook algorithm any further,
11883 let us study a small example of the transition
11892 from |εU(x) |πand |εV(x) |πto the coe∃cients
11899 of |εW(x). |πThis example will not demonstrate
11906 the e∃ciency of the method, since the numbers
11914 are too small, but it points out some useful
11923 simpli_cations that we can make in the general
11931 case. Suppose that we want to multiply |εu|4α=↓|41234
11939 |πtimes |εv|4α=↓|42341; |πin binary notation
11944 this is |εu|4α=↓{U0}{H9L11M25}|πW58320#Computer
folio 376 galley 11 WARNING: Some bad spots on this tape.
11947 Programming!(Knuth/Addision-Wesley)!f.376!Ch.4!G.11b.|'
11948 {A20}{H10L12M29}Hence we _nd, for |εW(x)|4α=↓|4U(x)V(x),|'
11953 {A9}|h|εW(0)|4|∂α=↓|410,!W(1)|4|∂α=↓|4304,!W(2)|4|∂α=↓|41980
11953 ,!W(3)|4|∂α=↓|47084,!W(4)|4|∂α=↓|418526.|E|n|;
11954 | U(0)|4|Lα=↓| 2,!U(1)|4|Lα=↓| 19,!U(2)|4|Lα=↓| 44,!U(3)|4|L
11954 α=↓| 77,!U(4)|4|Lα=↓|9|1118;>{A4}| V(0)|4|Lα=↓| 5,!V(1)|4|Lα
11955 =↓| 16,!V(2)|4|Lα=↓| 45,!V(3)|4|Lα=↓| 92,!V(4)|4|Lα=↓|4|9|11
11955 57;>{A4}| W(0)|4|Lα=↓| 10,!W(1)|4|Lα=↓| 304,!W(2)|4|Lα=↓| 19
11956 80,!W(3)|4|Lα=↓| 7084,!W(4)|4|Lα=↓18526.>{A4}(12)|?
11958 {A9}|πOur job now is to compute the _ve coe∃cients
11967 of |εW(x) |πfrom the latter _ve values.|'!|9|4|1|1|1There
11975 is an attractive little algorithm which can be
11983 used to compute the coe∃cients of a polynomial
11991 |εW(x)|4α=↓|4W|βm|βα_↓|β1x|gm|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α
11991 +↓|4W|β1x|4α+↓|4W|β0 |πwhen the values |εW(0),
11996 W(1),|4.|4.|4.|4,|4W(m|4α_↓|41) |πare given:
11999 Let us _rst write|'{A9}|εW(x)|4α=↓|4|≤u|βm|βα_↓|β1x|gm|gα_↓|
12003 g1|4α+↓|4|≤u|βm|βα_↓|β2x|gm|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓
12003 |4|≤u|β1x|g1|4α+↓|4|≤u|β0,|J!(13)|;{A9}|πwhere
12005 |εx|gk|4α=↓|4x(x|4α_↓|41)|4.|4.|4.|4(x|4α_↓|4k|4α+↓|41),
12006 |πand where the |ε|≤u|βj |πare unknown as well
12014 as the |εW|βj. |πNow|'{A9}|εW(x|4α+↓|41)|4α_↓|4W(x)|4α=↓|4(m
12018 |4α_↓|41)|≤u|βm|βα_↓|β1x|gm|gα_↓|g2|4α+↓|4(m|4α_↓|42)|≤u|βm|
12018 βα_↓|β2x|gm|gα_↓|g3|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4|≤u|β1,|;
12019 {A9}|πand by induction we _nd that for all |εk|4|¬R|40|'
12028 {A9}|ε|(1|d2k*3|)|4|↔aW(x|4α+↓|4k)|4α_↓|4|↔a|(k|d51|)|↔sW(x|4
12028 α+↓|4k|4α_↓|41)|4α+↓|'{A4}α+↓|4|↔a|(k|d52|)|↔sW(x|4α+↓|4k|4α
12029 _↓|42)|4α_↓|4|¬O|4|¬O|4|¬O|4α+↓|4(|→α_↓1)|gkW(x)|↔s|?
12030 {A4}α=↓|4|↔a|(m|4α_↓|41|d5k|)|↔s|≤u|βm|βα_↓|β1x|gm|gα_↓|g1|g
12030 α_↓|gk|4α+↓|4|↔a|(m|4α_↓|42|d5k|)|↔s|≤u|βm|βα_↓|β2x|gm|gα_↓|
12030 g2|gα_↓|gk|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4|↔a|(k|d5k|)|↔s|≤u|βk.|
12030 J!(14)|;{A9}|πDenoting the left-hand side of
12036 (14) in the customary way as (1/|εk*3)|4|≤-|gkW(x),
12043 |πwe see that|'{A9}|ε|(1|d2k*3|)|4|≤-|gkW(x)|4α=↓|4|(1|d2k|)|
12046 4|↔a|(1|d2(k|4α_↓|41)*3|)|4|≤-|gk|gα_↓|g1W(x|4α+↓|41)|4α_↓|4|
12046 (1|d2(k|4α_↓|41)*3|)|4|≤-|gk|gα_↓|g1W(x)|↔s|;{A9}|πand
12048 (1/|εk*3)|4|≤-|gkW(0)|4α=↓|4|≤u|βk. |πSo the coe∃cients
12052 |ε|≤u|βj |πcan be evaluated using a very simple
12060 method, illustrated here for the polynomial |εW(x)
12067 |πin (12):|'{A9}|h|ε11111!|∂11111!|∂1111/2|4α=↓|43333!|∂1111
12069 /3|4α=↓|4444!|∂111/4|4α=↓|436|E|n|;|L!|1|1|9|1>
12071 |L|9|9|1|1294>|L|9|9|1|1304|L1382/2|4α=↓|4|9|1691>
12073 |L|L|9|11676|L|L1023/3|4α=↓|4341>|L|9|11980|L|L3428/2|4α=↓|4
12074 1714|L|L144/4|4α=↓|436|J!(15)>|L|L|9|15104|L|L1455/3|4α=↓|44
12075 85>|L|9|17084|L|L6338/2|4α=↓|43169>|L|L1142>|L18526>
12079 {A9}|π{H10L12M29}|πThe leftmost column of this
12084 tableau is a listing of the given values of |εW(0),
12094 W(1),|4.|4.|4.|4,|4W(4); |πthe |εk|πth succeeding
12098 column is obtained by computing the di=erence
12105 between successive values of the preceding column
12112 and dividing by |εk. |πThe coe∃cients |ε|≤u|βi
12119 |πappear at the top of the columns, so that |ε|≤u|β0|4α=↓|41
12128 0,|4|≤u|β1|4α=↓|4294,|4.|4.|4.|4,|4|≤u|β4|4α=↓|436,
12129 |πand we have|'{A9}|h|εW(x)|4|∂α=↓|4{H12}({H10}((36(x|4α_↓|4
12132 3)|4α+↓|4341)(x|4α_↓|42)|4α+↓|4691)(x|4α_↓|41)|4α+↓|4294)x|4
12132 α+↓|410.|E|n|;| W(x)|4|Lα=↓|436x|g4|4α+↓|4341x|g3|4α+↓|4691x
12133 |g2|4α+↓|4294x|g1|4α+↓|410>{A4}|Lα=↓|4{H12}({H10}((36(x|4α_↓
12134 |43)|4α+↓|4341)(x|4α_↓|42)|4α+↓|4691)(x|4α_↓|41)|4α+↓|4294)x
12134 |4α+↓|410.|J!(16)>{A9}|πIn general, we can write|'
12140 {A9}|ε|¬O|4{H12}({H10}(|≤u|βm|βα_↓|β1(x|4α_↓|4m|4α+↓|42)|4α+
12140 ↓|4|≤u|βm|βα_↓|β2)(x|4α_↓|4m|4α+↓|43)|4α+↓|4|≤u|βm|βα_↓|β3)|
12140 4α⊗↓|'{A4}α⊗↓|4(x|4α_↓|4m|4α+↓|44)|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|
12141 4|≤u|β1{H12}){H10}x|4α+↓|4|≤u|β0,|?{A9}|πand
12143 this formula shows how the coe∃cients |εW|βm|βα_↓|β1,|4.|4.|
12149 4.|4,|4W|β1,|4W|β0 |πcan be obtained from the
12155 |ε|≤u'|πs:|'{A9}!36|∂!!|→α_↓1|4|¬O|436|∂!!|→α_↓1|4|¬O|4111|∂
12156 !!|→α_↓1|4|¬O|4555|∂!!11|E|;|>36|?341|?>|>|;|→α_↓3|4|¬O|436|
12163 ?>|J#>|>36|?233|?691|?>|>|;|→α_↓2|4|¬O|436|?|→α_↓2|4|¬O|4233
12174 |?|;|;(17)>|J#>|>36|?161|?225|?294|?>|>|;|→α_↓1|4|¬O|436|?
12188 |→α_↓1|4|¬O|4161|?|→α_↓1|4|¬O|4225|?>|J#>|>36|?
12194 125|?64|?69|?10|?>{A9}Here the numbers below
12203 the horizontal lines successively show the coe∃cients
12210 of the polynomials|'{A9}|ε{A9}|π!|9|4|1|1|1From
12214 this tableau we have|'{A9}|εW(x)|4α=↓|436x|g4|4α+↓|4125x|g3|
12218 4α+↓|464x|g2|4α+↓|469x|4α+↓|410,|;{A9}|πso the
12221 answer to our original problem is 1234|4|¬O|42341|4α=↓|4|εW(
12227 16), |πwhere |εW(16) |πis obtained by adding
12234 and shifting. A generalization of this method
12241 for obtaining coe∃cients is discussed in Section
12248 4.6.4.|'!|9|4|1|1|1The basic Stirling number
12253 identity,|'{A9}|εx|gn|4α=↓|4|↔A|(n|d5n|)|↔S|4x|gn|4α+↓|4|¬O|
12254 4|¬O|4|¬O|4α+↓|4|↔A|(n|d51|)|↔S|4x|g1|4α+↓|4|↔An|d50|)|↔S,|;
12255 {A9}|πEq. 1.2.6<41, shows that if the coe∃cients
12262 of |εW(x) |πare nonnegative, so are the numbers
12270 |ε|≤u|βj, |πand in such a case |εall of the intermediate
12280 results in the above computation are nonnegative.
12287 |πThis further simpli_es the Toom<Cook multiplication
12293 algorithm, which we will now consider in detail.which
12301 are manipulated during this algorithm:|'{A9}!|9|4|1|1|1Stack
12306 |εU,|4V|*/:|\!!|π|∂Temporary storage of |εU(j)
12311 |πand |εV(j) |πin step C4.|'| Stack |εC|*/:|\|L|πNumbers
12318 to be multiplied, and control codes.>| Stack
12325 |εW|*/:|\|L|πStorage of |εW(j).>{A9}|πThese stacks
12330 may contain either binary numbers or special
12337 symbols called code-1, code-2, code-3, and code-4.
12344 The algorithm also constructs an auxiliary table
12351 of numbers |εq|βk, r|βk; |πthis table is maintained
12359 in such a manner that it may be stored as a linear
12371 list, and all accesses to this table are made
12380 in a simple manner so that a single pointer (which
12390 traverses the list, moving back and forth) may
12398 be used to access the current table entry of
12407 interest.|'!|9|4|1|1|1(Stack |εC |πand |εW |πin
12413 this algorithm are used to control the recursive
12421 mechanism of the multiplication algorithm in
12427 a reasonably straightforward manner which is
12433 a special case of the general procedures discussed
12441 in Chapter 8.)|'{A3}{|1|≡C|≡1|≡.|9[Compute |εq,
12446 r |πtables.] Set stacks |εU, V, C, |πand |εW
12455 |πempty. Set|'{A9}!!|4|4|εk|4|¬L|41,!!q|β0|4|¬L|4q|β1|4|¬L|4
12457 16,!!r|β0|4|¬L|4r|β1|4|¬L|44,!!Q|4|¬L|44,!!R|4|¬L|42.|;
12458 {A9}|π!!|4|4Now if |εq|βk|βα_↓|β1|4α+↓|4q|βk|4|¬W|4n,
12461 |πset|'{A9}|ε*?!!|4|4|εk|4|¬L|4k|4α+↓|41,!!Q|4|¬L|4Q|4α+↓|4R,
12462 !!R|4|¬L|4|"l{H11}|¬H{H10}|v2Q|)|"L,!!q|βk|4|¬L|42|gQ,!!r|βk
12462 |4|¬L|42|gR,|;{A9}|π!!|4|4and repeat this operation
12467 until |εq|βk|βα_↓|β1|4α+↓|4q|βk|4|¬R|4n. (Note|*/:
12470 |\|πThe calculation of |εR|4|¬L|4|"l{H11}|¬H{H10}|v4Q|)|"L
12474 |πdoes not require a square root to be taken,
12483 since we may simply set |εR|4|¬L|4R|4α+↓|41 |πif
12490 (|εR|4α+↓|41)|g2|4|¬E|4Q |πand leave |εR |πunchanged
12495 if (|εR|4α+↓|41)|g2|4|¬Q|4Q; |πsee exercise 2.
12500 In this step we build the sequence|'{A9}|ε|h|ε!!|4|4q|βk|4|∂
12507 α=↓|4|∂2|g4!|∂2|g4!|∂2|g6!|∂2|g8!|∂2|g1|g0!|∂2|g1|g3!|∂2|g1|
12507 g6!|∂.|4.|4.|E|n|;| k|4|Lα=↓|4|L0|L1|L2|L3|L4|L5|L6|L.|4.|4.
12508 >{A4}| q|βk|4|Lα=↓|4|L2|g4|L2|g4|L2|g6|L2|g8|L2|g1|g0|L2|g1|
12509 g3|L2|g1|g6|L.|4.|4.>{A4}| r|βk|4|Lα=↓|4|L2|g2|L2|g2|L2|g2|L
12510 2|g2|L2|g3|L2|g3|L2|g4|L.|4.|4.>{A9}|E|'|π!!|4|4The
12513 multiplication of 70000-bit numbers would cause
12519 this step to terminate with |εk|4α=↓|46, |πsince
12526 70000|4|¬W|42|g1|g3|4α+↓|42|g1|g6.)|'{A12}|9|1|≡C|≡2|≡.|9[Pu
12527 t |εu, v |πon stack.] Put code-1 on stack |εC,
12537 |πthen place |εu |πand |εv |πonto stack |εC |πas
12546 numbers of exactly |εq|βk|βα_↓|β1|4α+↓|4q|βk
12550 |πbits each.|'{A3}|9|1|≡C|≡3|≡.|9[|πCheck recursion
12554 level.] Decrease |εk |πby 1. If |εk|4α=↓|40,
12561 |πthe top of stack |εC |πcontains two 32-bit
12569 numbers, |εu |πand |εv; |πset |εw|4|¬L|4uv |πusing
12576 a built-in routine for multiplying 32-bit numbers,
12583 and go to spep C10. If |εk|4|¬Q|40, |πset |εr|4|¬L|4r|βk,
12592 q|4|¬L|4q|βk, p|4|¬L|4q|βk{U0}{H9L11M29}|πW58320#Computer
folio 379 galley 12
12594 Programming!(Knuth/Addision-Wesley)!F.379!Ch.4!G.12b.|'
12595 {A20}{H10L12M29}{I2.1H}|9|1|≡C|≡4|≡.|9[Break
12596 into |εr|4α+↓|41 |πparts.] Let the number at
12603 the top of stack |εC |πbe regarded as a list
12613 of |εr|4α+↓|41 |πnumbers with |εq |πbits each,
12620 (|εU|βr|4.|4.|4.|4U|β1U|β0)|β2|lq. (|πThe top
12623 of stack |εC |πnow contains an (|εr|4α+↓|41)q|4α=↓|4(q|βk|4α
12629 +↓|4q|βk|βα+↓|β1)-|πbit number.) For |εj|4α=↓|40,|41,|4.|4.|
12632 4.|4,|42r |πcompute the |εp-|πbit numbers|'{A9}|ε!!|4|4(|¬O|
12637 4|¬O|4|¬O(U|βrj|4α+↓|4U|βr|βα_↓|β1)j|4α+↓|4|¬O|4|¬O|4|¬O|4α+
12637 ↓|4U|β1)j|4α+↓|4U|β0|4α=↓|4U(j)|;{A9}|π!!|4|4and
12639 successively put these values onto stack |εU.
12646 (|πThe bottom of stack |εU |πnow contains |εU(0),
12654 |πthen comes |εU(1), |πetc., with |εU(2r) |πon
12661 top. Note that|'{A9}|ε!!|4|4U(j)|4|¬E|4U(2r)|4|¬W|42|gq{H12}
12664 ({H10}(2r)|gr|4α+↓|4(2r)|gr|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓
12664 |41{H12}){H10}|4|¬W|42|gq|gα+↓|g1(2r)|gr|4|¬E|42|gp,|;
12665 {A9}|π!!|4|4by exercise 3.) Then remove |εU|βr|4.|4.|4.|4U|β
12670 1U|β0 |πfrom stack |εC.|'|π!!|4|4!|9|4|1|1|1Now
12675 the top of stack |εC |πcontains another list
12683 of |εr|4α+↓|41 q-|πbit numbers, |εV|βr|4.|4.|4.|4V|β1V|β0,
12688 |πand the |εp-|πbit numbers|'{A9}|ε{H10L12}!!|4|4(|¬O|4|¬O|4
12692 |¬O|4(V|βrj|4α+↓|4V|βr|βα_↓|β1)j|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4V
12692 |β1)j|4α+↓|4V|β0|4α=↓|4V(j)|;{A9}|π!!|4|4should
12694 be put onto stack |εV |πin the same way. After
12704 this has been done, remove |εV|βr|4.|4.|4.|4V|β1V|β0
12710 |πfrom stack |εC.|'{A3}|π|9|1|≡C|≡5|≡,|9[Recurse.]
12714 Successively put the following items onto stack
12721 |εC, |πat the same time emptyping stacks |εU
12729 |πand |εV:|'{A9}|ε|π!!|4|4code-2,|4|εV(2r),|4U(2r),|4|πcode-
12731 3,|4|εV(2r|4α_↓|41),|4U(2r|4α_↓|41),|4.|4.|4.|4,|'
12732 {A4}|πcode-3,|4|εV(1),|4U(1),|4|πcode-3,|4|εV(0),|4U(0).|?
12733 {A9}|π!!|4|4Put code-4 onto stack |εW. |πGo back
12740 to step C3.|'{A3}|9|1|≡C|≡6|≡.|9[Save one product.]
12746 (At this point the multiplication algorithm has
12753 set |εW |πto one of the products |εW(j)|4α=↓|4U(j)V(j).)
12761 |πPut |εw |πonto stack |εW. (|πThis number |εw
12769 |πcontains 2(|εq|βk|4α+↓|4q|βk|βα_↓|β1) |πbits.)
12772 Go back to step C3.|'{A12}{H9L11}|≡F|≡i|≡g|≡.
12778 |≡8|≡.|9|4Toom<Cook algorithm for high-precision
12782 multiplication.|'{A12}{H10L12M29}|9|1|≡C|≡7|≡.|9[Find
12784 |ε|≤u'|πs.] Set |εr|4|¬L|4r|βk, q|4|¬L|4q|βk,
12788 p|4|¬L|4q|βk|βα_↓|β1|4α+↓|4q|βk. (|πAt this point
12792 stack |εW |πcontains|4.|4.|4.|4, |πcode-4, |εW(0),
12797 W(1),|4.|4.|4.|4,|4W(2r) |πfrom bottom to top,
12802 where each |εW(j) |πis a 2|εp-|πbit number.)|'
12809 !!|4|4!|9|4|1|1|1Now for |εj|4α=↓|41,|42,|43,|4.|4.|4.|4,|42
12811 r, |πperform the following loop: For |εt|4α=↓|42r,
12818 2r|4α_↓|41, 2r|4α_↓|42,|4.|4.|4.|4,|4j |πset
12821 |εW(t)|4|¬L|4(W(t|4α_↓|41){H12}){H10}/j. (|πHere
12823 |εj |πmust increase and |εt |πmust decrease.
12830 The quantity {H12}({H10}|εW(t)|4α_↓|4W(t|4α_↓|41){H12}){H10}
12832 /j |πwill always be a nonnegative integer which
12840 _ts in 2|εp |πbits; cf. (15).)|'{A3}|9|1|≡C|≡8|≡.|9[Find
12847 |εW'|πs.] For |εj|4α=↓|42r|4α_↓|41, 2r|4α_↓|42,|4.|4.|4.|4,|
12850 41, |πperform the following loop: For |εt|4α=↓|4j,
12857 j|4α+↓|41,|4.|4.|4.|4, 2r|4α_↓|41 |πset |εW(t)|4|¬L|4W(t)|4α
12860 _↓|4jW(t|4α+↓|41). |πHere |εj |πmust decrease
12865 and |εt |πmust increase. The result of this operation
12874 will again be a nonnegative |ε2p-|πbit integer;
12881 cf. (17).)|'{A3}|9|1|≡C|≡9|≡.|9[Set answer.]
12885 Set |εw |πto the 2(|εq|βk|4α+↓|4q|βk|βα+↓|β1)-|πbit
12890 integer|'{A9}|ε!!|4|4(|¬O|4|¬O|4|¬O|4(W(2r)2|gq|4α+↓|4W(2r|4
12891 α_↓|41){H12}){H10}2|gq|4α+↓|4|¬O|4|¬O|4|¬O)2|gq|4α+↓|4W(0).|
12891 ;{A9}|π!!|4|4Remove |εW(2r),|4.|4.|4.|4,|4W(0)
12894 |πand code-4 from stack |εW.|'{A3}|π*?*?|≡C|≡1|≡0|≡.|9[Return.
12899 ] Set |εk|4|¬L|4k|4α+↓|41. |πRemove the top of
12906 stack |εC. |πIf it is code-3, go to C6. If it
12917 is code-2, put |εw |πonto stack |εW |πand go
12926 to C7. And if it is code-1, terminate the algorithm
12936 (|εw |πis the answer).|'{A12}{IC}{H10L12M29}!|9|4|1|1|1Let
12941 us now estimate the running time, |εT(n), |πfor
12949 Algorithm C, in terms of some things we shall
12958 call ``cycles,'' i.e., elementary machine operations.
12964 Step C1 takes |εO(q|βk) |πcycles, even if we
12972 represent the number |εq|βk |πinternally as a
12979 long string of |εq|βk |πbits followed by some
12987 delimiter, since |εq|βk|4α+↓|4q|βk|βα_↓|β1|4α+↓|4|¬O|4|¬O|4|
12989 ¬O|4α+↓|4q|β0 |πwill be |εO(q|βk). |πStep C2
12995 obviously takes |εO(q|βk) |πcycles.|'!|9|4|1|1|1Now
13000 let |εt|βk |πdenote the amount of computation
13007 required to get from step C3 to step C10 for
13017 a particular value of |εk (after k |πhas been
13026 decreased at the beginning of step C3). Step
13034 C3 requires |εO(q) |πcycles at most. Step C4
13042 involves |εr |πmultiplications of a lg(|εr|4α+↓|41)-|πbit
13048 number by a |εp-|πbit number, and |εr |πadditions
13056 of |εp-|πbit numbers, all repeated |ε4r|4α+↓|42
13062 |πtimes. Thus we need a total of |εO(r|g2q|4|πln|4|εr)
13070 |πcycles. Step C5 requires moving |ε4r|4α+↓|42
13076 p-|πbit numbers, so it involves |εO(rq) |πcycles.
13083 Step C6 requires |εO(q) |πcycles, and it is done
13092 |ε2r|4α+↓|41 |πtimes per iteration. The recursion
13098 involved when the algorithm essentially invokes
13104 itself (by returning to step C3) requires |εt|βk|βα_↓|β1
13112 |πcycles, 2|εr|4α+↓|41 |πtimes. Step C7 requires
13118 |εO(r|g2) |πsubtractions of |εp-|πbit numbers
13123 and divisions of 2|εp-|πbit by (lg|4|εr)-|πbit
13129 numbers, so it requires |εO(r|g2q|4|πln|4|εr)
13134 |πcycles. Similarly, step C8 requires |εO(r|g2q|4|πln|4|εr)
13140 |πcycles. Step C9 involves |εO(rq) |πcycles,
13146 and C10 takes hardly any time at all.|'!|9|4|1|1|1Summing
13155 up we have, for |εq|4α=↓|4q|βk |πand |εr|4α=↓|4r|βk,
13162 T(n)|4α=↓|4O(q|βk)|4α+↓|4O(q|βk)|4α+↓|4t|βk|βα_↓|β1,
13163 |πwhere|'{A9}|εt|βk|4|∂α=↓|4O(q)|4α+↓|4O(r|g2q|4|πln|4|εr)|4
13164 α+↓|4O(rq)|4α+↓|4(2r|4α+↓|41)O(q)|4α+↓|4O(r|g2q|4|πln|4|εr)|
13164 ;{A4}|L|4!|1|1|1α+↓|4O(r|g2q|4|πln|4|εr)|4α+↓|4O(rq)|4α+↓|4O
13165 (q)|4α+↓|4(2r|4α+↓|41)t|βk|βα_↓|β1>{A4}|L|4α=↓|4|εO(r|g2q|4|
13166 πln|4|εr)|4α+↓|4(2r|4α+↓|41)t|βk|βα_↓|β1;>{A9}|πthus
13168 there is a constant |εc |πsuch that|'{A9}|εt|βk|4|¬E|4cr|ur2
13175 |)k|)q|βk|4|πlg|4|εr|βk|4α+↓|4(2r|βk|4α+↓|41)t|βk|βα_↓|β1.|;
13176 {A9}|πTo complete the estimation of |εt|βk |πwe
13183 can prove by brute force that|'{A9}|εt|βk|4|¬E|4Cq|βk|βα+↓|β
13189 12|g2|g.|g5|g|¬H|π|gl|gg|1|1|ε|gq|rk|rα⊗↓|r1|J!(18)|;
13190 {A9}|πFor some constant |εC. |πLet us choose
13197 |εC|4|¬Q|420c, |πand let us also take |εC |πlarge
13205 enough so that (18) is valid for |εk|4|¬E|4k|β0,
13213 |πwhere |εk|β0 |πwill be speci_ed below. Then
13220 when |εk|4|¬Q|4k|β0, |πlet |εQ|βk|4α=↓|4|πlg|4|εq|βk,
13224 R|βk|4α=↓|4|πlg|4|εr|βk; |πwe have by induction|'
13229 {A9}|εt|βk|4|∂|¬E|4cq|βkr|ur2|)k|)|4|πlg|4|εr|βk|4α+↓|4(2r|β
13229 k|4α+↓|41)Cq|βk2|ur2.5|¬HQ|βk|)|)|;{A4}|L|4α=↓|4Cq|βk|βα+↓|β
13230 12|ur2.5|¬H|πlg|4|εq|βk|βα+↓|β1|)|)(|≤h|β1|4α+↓|4|≤h|β2),>
13231 {A6}|πwhere|'{A6}|ε|≤h|β1|4|∂α=↓|4|(c|d2C|)|4R|βk2|urR|βkα_↓
13232 2.5|¬HQ|βk|βα+↓|β1|)|)|4|¬W|4|(1|d220|)|4R|βk2|gα_↓|gR|rk|4|
13232 ¬W|40.05,|;{A4}| |≤h|β2|4|Lα=↓|4|↔a2|4α+↓|4|(1|d2r|βk|)|↔s|4
13233 2|ur2.5(|¬HQ|βkα_↓|¬HQ|βk|βα+↓|β1)|)|)|4|¬M|42|gα_↓|g1|g/|g4
13233 |4|¬W|40.85,>{A6}|πsince|'{A6}|ε{H10L12M29}|¬H|v2|εQ|βk|βα+↓
13235 |β1|)|4α_↓|4|¬H|v2Q|βk|)|4α=↓|4{H12}|¬H{H10}Q|βk|4α+↓|4|"l|¬
13235 H|v2Q|βk|)|"L|4α_↓|4|¬H|v2Q|βk|)|4|¬M|4|f1|d32|)|;
13236 {A9}|πas |εk|4|¬M|4|¬X. |πIt follows that we
13242 can _nd |εk|β0 |πsuch that |ε|≤h|β2|4|¬W|40.95
13248 |πfor all |εk|4|¬Q|4k|β0, |πand this completes
13254 the proof of (18) by induction.|'!|9|4|1|1|1Finally,
13261 therefore, we may compute |εT(n); |πsince |εn|4|¬Q|4q|βk|βα_
13267 ↓|β1|4α+↓|4q|βk|βα_↓|β2, |πwe have |ε|βk|βα_↓|β1|4|¬W|4n;
13271 |πhence|'{A9}|ε|εr|βk|βα_↓|β1|4α=↓|42|ur|"l|πlg|4|εq|βk|βα_↓
13272 |β1|"L|)|)|4|¬W|42|ur|¬H|πlg|4|εn|)|),!!|πand!!|εq|βk|4α=↓|4
13272 r|βk|βα_↓|β1q|βk|βα_↓|β1|4|¬W|4n2|ur|¬H|πlg|4|εn|)|).|;
13273 {A9}|πThus|'{A9}|εt|βk|βα_↓|β1|4|¬E|4Cq|βk2|ur2.5|¬H|4|εq|βk
13274 |)|)|4|¬W|4Cn2|ur|¬H|πlg|4|εnα+↓2.5(|¬H|πlg|4|εnα+↓1)|)|),|;
13275 {A9}|πand, since |εT(n)|4α=↓|4O(qk)|4α+↓|4t|βk|βα_↓|β1,
13278 |πwe have _nally the following theorem:|'{A12}|≡T|≡h|≡e|≡o|≡
13284 r|≡e|≡m |≡C|≡.|9|4|εThere is a constant c|β0
13290 such that the execution time of Algorithm C is
13299 less than c|β0n2|ur2.5|¬H|πlg|4|εn|)|) cycles.|'
13303 {A12}|π{H10L12}This result is noticeably stronger
13308 than Theorem A, since |εn2|ur3.5|¬H|πlg|4|εn|)|)|4α=↓|4n|ur1
13312 α+↓3.5/|¬H|πlg|4|εn|)|). |πBy adding a few complications
13318 to the algorithm, pushing the ideas to their
13326 apparent limits (see exercise 5), we can improve
13334 the estimated execution time to|'{A9}|εT(n)|4α=↓|4O(n2|ur|¬H
13339 2|4|πlg|4|εn|)|)|4|πlog|4|εn).|J!(19)|;{A9}|π|≡B|≡.
13341 |≡A |≡m|≡o|≡d|≡u|≡l|≡a|≡r |≡m|≡e|≡t|≡h|≡o|≡d|≡.|9|4|πThere
13344 is another way to multiply large nqmbers very
13352 rapidly, based on the ideas of modular arithmetic
13360 as presented in Section 4.3.2. It is very hard
13369 to believe at _rst that this method can be of
13379 advantage, since a multiplication algorithm based
13385 on modular arithmetic must include the choice
13392 of moduli and the conversion of numbers into
13400 and out of modular representation, besides the
13407 actual multiplication operation itself. In spite
13413 of these formidable di∃culties, A. Sc{U0}{H9L11M29}|πW58320#
folio 382 galley 13
13418 Computer Programming!(Knuth/Addision-Wesley)!F.382!Ch.4!G.13
13419 b.|'{A20}{H10L12M29}!|9|4|1|1|1In order to understand
13424 the essential mechanism of Sch|=4ohage's method,
13430 we shall look at a special case. Consider the
13439 sequence de_ned by the rules|'{A9}|ε|εq|β0|4α=↓|41,!!q|βk|βα
13444 +↓|β1|4α=↓|43q|βk|4α_↓|41,|J!(20)|;{A9}|π|πso
13446 that |εq|βk|4α=↓|43|gk|4α_↓|43|gk|gα_↓|g1|4α_↓|4|¬O|4|¬O|4|¬
13447 O|4α_↓|41|4α=↓|4|f1|d32|)(3|gk|4α+↓|41). |πWe
13449 will study a procedure that jultiplies (18|εq|βk|4α+↓|48)-|π
13455 bit numbers, in terms of a method for multiplying
13464 |ε(18q|βk|βα_↓|β1|4α+↓|48)-|πbit numbers. Thus,
13467 if we know how to multiply numbers having (18|εq|β0|4α+↓|48)
13475 |4α=↓|426 |πbits, the procedure to be described
13482 will show us how to multiply numbers of (18|εq|β1|4α+↓|48)|4
13490 α=↓|444 |πbits, then 98 bits, then 260 bits,
13498 etc., eventually increasing the number of bits
13505 by almost a factor of 3 at each step.|'!|9|4|1|1|1Let
13515 |εp|βk|4α=↓|418q|βk|4α+↓|48. |πWhen multiplying
13518 |εp|βk-|πbit numbers, the idea is to use the
13526 six moduli|'{A9}|εm|β1|4α=↓|42|g6|gq|rk|gα_↓|g1|4α_↓|41,!!m|
13528 β2|4α=↓|42|g6|gq|rk|gα+↓|g1|4α_↓|41,!!m|β3|4α=↓|42|g6|gq|rk|
13528 gα+↓|g2|4α_↓|41,|;{A4}m|β4|4α=↓|42|g6|gq|rk|gα+↓|g3|4α_↓|41,
13529 !!m|β5|4α=↓|42|g6|gq|rk|gα+↓|g5|4α_↓|41,!!m|β6|4α=↓|42|g6|gq
13529 |rk|gα+↓|g7|4α_↓|41.|J!(21)|;{A9}|πThese moduli
13532 are relatively prime, by Eq. 4.3.2<18, since
13539 the exponents|'{A9}|ε6q|βk|4α_↓|41,!!6q|βk|4α+↓|41,!!6q|βk|4
13541 α+↓|42,!!6q|βk|4α+↓|43,!!6q|βk|4α+↓|45,!!6q|βk|4α+↓|47|J!(22
13541 )|;{A9}|πare always relatively prime (see exercise
13548 6). The six moduli in (21) are capable of representing
13558 numbers up to |εm|4α=↓|4m|β1m|β2m|β3m|β4m|β5m|β6|4|¬Q|42|g3|
13561 g6|gq|rk|gα+↓|g1|g6|4α=↓|42|g2|gp|rk, |πso there
13564 is no chance of over⊗ow in the multiplication
13572 of |εp|βk-|πbit numbers |εu |πand |εv. |πThus
13579 we may use the following method:|'{A12}{I1.2H}a)|9|1Compute
13586 |εu|β1|4α=↓|4u |πmod |εm|β1,|4.|4.|4.|4,|4u|β6|4α=↓|4u
13589 |πmod |εm|β6; v|β1|4α=↓|4v |πmod |εm|β1,|4.|4.|4.|4,|4v|β6|4
13593 α=↓|4v |πmod |εm|β6.|'|πb)|9Multiply |εu|β1 |πby
13599 |εv|β1, u|β2 |πby |εv|β2,|4.|4.|4.|4,|4u|β6 |πby
13604 |εv|β6. |πThese are numbers of at most |ε6q|βk|4α+↓|47|4α=↓|
13611 418q|βk|βα_↓|β1|4α+↓|41|4|¬W|4p|βk|βα_↓|β1 |πbits,
13613 so the multiplications can be performed by using
13621 the assumed |εp|βk|βα_↓|β1-|πbit multiplication
13625 procedure.|'c)|9|1|1Compute |εw|β1|4α=↓|4u|β1v|β1
13628 |πmod |εm|β1, w|β2|4α=↓|4u|β2v|β2 |πmod |εm|β2,|4.|4.|4.|4,|
13632 4w|β6|4α=↓|4u|β6v|β6 |πmod |εm|β6.|'|πd)|9Compute
13636 |εw |πsuch that 0|4|¬E|4|εw|4|¬W|4m, w |πmod
13642 |εm|β1|4α=↓|4w|β1,|4.|4.|4.|4,|4w |πmod |εm|β6|4α=↓|4w|β6.|'
13645 {A12}|π{IC}!|9|4|1|1|1Let |εt|βk |πbe the amount
13650 of time needed for this process. It is not hard
13660 to see that operation (a) takes |εO(p|βk) |πcycles,
13668 since the determination of |εu |πmod(2|ε|g2|4α_↓|41)
13674 |πis quite simple (like ``casting-out nines''),
13680 as shown in Section 4.3.2. Similarly, operation
13687 (c) takes |εO(p|βk) |πcycles. Operation (b) requires
13694 essentially 6|εt|βk|βα_↓|β1 |πcycles. This leaves
13699 us with operation (d), which seems to be quite
13708 a di∃cult computation; but Sch|=4ohage has found
13715 an ingenious way to perform step (d) in |εO(p|βk|4|πlog|4|εp
13723 |βk) |πcycles, and this is the crux of the method.
13733 As a consequence, we have|'{A9}|εt|βk|4α=↓|46t|βk|βα_↓|β1|4α
13738 +↓|4O(p|βk|4|πlog|4|εp|βk).|;{A9}|πSince |εp|βk|4α=↓|43|gk|g
13740 α+↓|g2|4α+↓|417, |πwe can show that|'{A9}|εt|βk|4α=↓|4O(6|gk
13745 )|4α=↓|4O(p|ur1.63|)k|)).|J!(23)|;{A9}|π(See
13747 exercise 7.)|'{A12}!|9|4|1|1|1So although this
13752 method is more complicated than the |εO(n|π|gl|gg|1|1|g3)
13759 |πprocedure given at the beginning of the section,
13767 it does, in fact, lead to an execution time substantially
13777 better than |εO(n|g2) |πfor the multiplication
13783 of |εn-|πbit numbers. Thus we can improve on
13791 the classical method by using either of two completely
13800 di=erent approaches.|'!|9|4|1|1|1Let us now analyze
13806 operation (d) above. Assume that we are given
13814 the positive integers |εe|β1|4|¬W|4e|β2|4|¬W|4|¬O|4|¬O|4|¬O|
13817 4|¬W|4e|βr, |πrelatively prime in pairs; let|'
13823 {A9}|εm|β1|4α=↓|42|ge|r1|4α_↓|41,!!m|β2|4α=↓|42|ge|r2|4α_↓|4
13823 1,!!.|4.|4.|4,!!m|βr|4α=↓|42|ge|rr|4α_↓|41.|J!(24)|;
13824 {A9}|πWe are also given numbers |εw|β1,|4.|4.|4.|4,|4w|βr
13830 |πsuch that |ε0|4|¬E|4w|βj|4|¬E|4m|βj. |πOur
13834 job is |εto determine the binary representation
13841 of the number w which satis⊂es the conditions|'
13849 {A9}0|4|¬E|4w|4|¬W|4m|β1m|β2|4.|4.|4.|4m|βr,|;
13850 {A4}w|4|"o|4w|β1!!(|πmodulo|4|εm|β1),!!.|4.|4.|4,!!w|4|"o|4w
13850 |βr!(|πmodulo|4|εm|βr).|J!(25)|;{A9}|πThe method
13853 is based on (23) and (24) of Section 4.3.2; _rst
13863 we compute|'{A9}|εw|ur|↔0|)j|)|4α=↓|4(|¬O|4|¬O|4|¬O|4{H12}({
13865 H10}(w|βj|4α_↓|4w|ur|↔0|)1|))c|β1|βj|4α_↓|4w|ur|↔0|)2|))c|β2
13865 |βj|4α_↓|4|¬O|4|¬O|4|¬O|4α_↓|4w|ur|↔0|)j|)|βα_↓|β1{H12}){H10
13865 }c|β(|βj|βα_↓|β1|β)|βj|4|πmod|4|εm|βj,|J!(26)|;
13866 {A9}|πfor |εj|4α=↓|42,|4.|4.|4.|4,|4r, |πwhere
13869 |εw|ur|↔0|)1|)|4α=↓|4w|β1 |πmod |εm|β1; |πthen
13873 we compute|'{A9}|εw|4α=↓|4{H12}({H10}|¬O|4|¬O|4|¬O|4(w|ur|↔0
13875 |)r|)m|βr|βα_↓|β1|4α+↓|4w|ur|↔0|)rα_↓1|))m|βr|βα_↓|β2|4α+↓|4
13875 |¬O|4|¬O|4|¬O|4α+↓|4w|ur|↔0|)2|){H12}){H10}m|β1|4α+↓|4w|ur|↔
13875 0|)1|).|J!(27)|;{A9}|πHere |εc|βi|βj |πis a number
13881 such that |εc|βi|βjm|βi|4|"o|41 (|πmodulo|4|εm|βj);
13885 |πthese numbers |εc|βi|βj |πare not given, they
13892 must be determined from the |εe|βj'|πs.|'!|9|4|1|1|1The
13899 calculation of (26) for all |εj |πinvolves (|ur|εr|)2|))
13907 |πadditions modulo |εm|βj, |πeach of which takes
13914 |εO(e|βr) |πcycles, plus (|ur|εr|)2|)) |πmultiplications
13919 by |εc|βi|βj, |πmodulo |εm|βj. |πThe calculation
13925 of |εw |πby formula (27) involves |εr |πadditions
13933 and |εr |πmultiplications by |εm|βj; |πit is
13940 easy to multiply by |εm|βj, |πsince this is just
13949 adding, shifting, and subtracting, so it is clear
13957 that the evaluation of Eq. (27) takes |εO(r|g2e|βr)
13965 |πcycles. We will soon see that each of the multiplications
13975 by |εc|βi|βj, |πmodulo |εm|βj, |πrequires only
13981 |εO(e|βr|4|πlog|4|εe|βr) |πcycles, and therefore
13985 |εthe entire job of conversion can be done in
13994 O(r|g2e|βr|4|πlog|4|εe|βr) cycles.|'|π!|9|4|1|1|1The
13997 above observations leave us with the following
14004 problem to solve: Given positive integers |εe|4|¬W|4f
14011 |πand a nonnegative integer |εu|4|¬W|42|gf, |πcompute
14017 |ε(cu)|πmod(2|ε|gf|4α_↓|41), |πwhere |εc |πis
14021 the number such that (2|ε|ge|4α_↓|41)c|4|"o|41
14026 (|πmodulo 2|ε|gf|4α_↓|41); |πand we must do this
14033 in |εO(f|4|πlog|4|εf) |πcycles. The result of
14039 exercise 4.3.2<6 gives a formula for |εc |πwhich
14047 suggests a procedure that can be used. First
14055 we _nd the least positive integer |εb |πsuch
14063 that|'{A9}|εbe|4|"o|41!(|πmodulo|4|εf).|J!(28)|;
14065 {A9}|πThis can be done using Euclid's algorithm
14072 in |εO{H12}({H10}(|πlog|4|εf)|g3{H12}){H10} |πcycles,
14075 since Euclid's algorithm applied to |εe |πand
14082 |εf |πrequires |εO(|πlog|4|εf) |πiterations,
14086 and each iteration requires |εO{H12}({H10}(|πlog|4|εf)|g2{H1
14090 2}){H10} |πcycles; alternatively, we could be
14096 very sloppy here without violating the total
14103 time constraint, by simply trying |εb|4α=↓|41,|42,|4|πetc.
14109 untll (28) is satis_ed, and such a process would
14118 take |εO(f|4|πlog|4|εf) |πcycles in all. Once
14124 |εb |πhas been found, exercise 4.3.2<6 tells
14131 us that|'{A9}|εc|4α=↓|4c[b]|4α=↓|4|↔a|↔k|uc|)0|¬Ej|¬Wb|)2|ge
14133 |gj|↔s|πmod(2|ε|gf|4α_↓|41).|J!(29)|;{A9}|π!|9|4|1|1|1A
14135 brute-force multiplication of |ε(cu) |πmod (2|ε|gf|4α_↓|41)
14141 |πwould not be good enough to solve the problem,
14150 since we do not know how to multiply general
14159 |εf-|πbit numbers in |εO(f|4|πlog|4|εf) |πcycles.
14164 But the special form of |εc |πprovides a clue:
14173 The binary representation of |εc |πis composed
14180 of bits in a regular pattern, and Eq. (29) shows
14190 that the number |εc[2b] |πcan be obtained in
14198 a simple way from |εc[b]. |πThis suggests that
14206 we can rapidly multiply a number |εu |πby |εc[b]
14215 |πif we build |εc[b]u |πup in lg|4|εb |πsteps
14223 in a suitably clever manner, such as the following:
14232 Let the binary notation for |εb |πbe|'{A9}|εb|4α=↓|4(b|βs|4.
14239 |4.|4.|4b|β2b|β1b|β0)|β2;|;{A9}|πwe may calculate
14243 the sequences |εa|βk, d|βk, u|βk, v|βk |πwhich
14250 are de_ned by the rules|'{A9}|ε|h|εv|β0|4|∂α=↓|4b|β0u,!!u|βk
14255 |4|∂α=↓|4(u|βk|βα_↓|β1|4α+↓|42|ga|rk|rα_↓|r1u|βk|βα_↓|β1)|πm
14255 od(2|ε|gf|4α_↓|41);|E|n|;| a|β0|4|Lα=↓|4e,| a|βk|4|Lα=↓|42a|
14256 βk|βα_↓|β1|4|πmod|4|εf;>{A4}| d|β0|4|Lα=↓|4b|β0e,| d|βk|4|Lα
14257 =↓|4(d|βk|βα_↓|β1|4α+↓|4b|βka|βk)|πmod|4|εf;>
14258 {A4}| u|β0|4|Lα=↓|4u,| u|βk|4|Lα=↓|4(u|βk|βα_↓|β1|4α+↓|42|ga
14258 |rk|rα_↓|r1u|βk|βα_↓|β1)|πmod(2|ε|gf|4α_↓|41);>
14259 {A4}| v|β0|4|Lα=↓|4b|β0u,| v|βk|4|Lα=↓|4(v|βk|βα_↓|β1|4α+↓|4
14259 b|βk2|gd|rk|rα_↓|r1u|βk)|πmod(2|ε|gf|4α_↓|41).|J!(30)>
14260 {A9}|πIt is easy to prove by induction on |εk
14269 |πthat|'{A9}|h|εu|βk|4|∂α=↓|4(c[2|gk]u)|πmod(2|ε|gf|4α_↓|41)
14270 ;!!d|βk|4|∂α=↓|4{H12}({H10}c[(b|βk|4.|4.|4.|4b|β1b|β0)|β2]u{
14270 H12}){H10}|πmod(2|ε|gf|4α_↓|41).|E|n|;| a|βk|4|Lα=↓|4(2|gke)
14271 |πmod|4|εf;| d|βk|4|Lα=↓|4{H12}({H10}(b|βk|4.|4.|4.|4b|β1b|β
14271 0)|β2e{H12}){H10}|πmod|4|εf;>{A4}| u|βk|4|Lα=↓|4(c[2|gk]u)|π
14272 mod(2|ε|gf|4α_↓|41);| v|βk|4|Lα=↓|4{H12}({H10}c[(b|βk|4.|4.|
14272 4.|4b|β1b|β0)|β2]u{H12}){H10}|πmod(2|ε|gf|4α_↓|41).|J!(31)>
14273 {A9}|πHence the desired result, (|εc[b]u)|πmod(2|ε|gf|4α_↓|4
14277 1), |πis |εv|βs. |πThe calculation of |εa|βk,
14284 d|βk, u|βk, v|βk |πfrom |εa|βk|βα_↓|β1, d|βk|βα_↓|β1,
14290 y|βk|βα_↓|β1, v|βk|βα_↓|β1 |πtakes |εO(|πlog|4|εfHence
14294 the desired result, (|εc[b]u)|πmod(2|ε|gf|4α_↓|41),
14298 |πis |εv|βs. |πThe calculation of |εa|βk, d|βk,
14305 u|βk, v|βk |πfrom |εa|βk|βα_↓|β1, d|βk|βα_↓|β1,
14310 y|βk|βα_↓|β1, v|βk|βα_↓|β1 |πtakes |εO(|πlog|4|εf)|4α+↓|4O(|
14313 πlog|4|εf)|4α+↓|4O(f)|4α+↓|4O(f)|4α=↓|4O(f) |πcycles,
14315 and therefore the entire calculation can be done
14323 in |εsO(f)|4α=↓|4O(f|4|πlog|4|εf) |πcycles as
14327 desired.|'!|9|4|1|1|1The reader will _nd it instructive
14334 to study the ingenious method represented by
14341 (30) and (31) very carefully. Similar techniques
14348 are discussed in Section 4.6.3.|'!|9|4|1|1|1Sch|=4onhage's
14354 paper [|εComputing |≡1 (1966), 182<196] |πshows
14360 that these ideas can be extended to the multiplication
14369 of |εn-|πbit numbers using |εr|4|¬V|42|ur|¬H2|4|πlg|4|εn|)|)
14373 |πmoduli, obtaining a method analogous to Algorithm
14381 C. We shall not dwell on the details here, since
14391 Algorithm C is always superior; in fact, an even
14400 better method is next on our agenda.|'{A12}|≡C|≡.
14408 |≡U|≡s|≡e |≡o|≡f |≡F|≡o|≡u|≡r|≡i|≡e|≡r |≡t|≡r|≡a|≡n|≡s|≡f|≡o
14411 |≡r|≡m|≡s|≡.|9|4The critical problem in high-precision
14416 multiplication is the determination of ``convolution
14422 products'' such as|'{A9}|εu|βrv|β0|4α+↓|4u|βr|βα_↓|β1v|β1|4α
14425 +↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|β0v|βr,|;|Hβ*?*?*?{U0}{H9L11M29}|πW58
folio 385 galley 14
14426 320#Computer Programming!(Knuth/Addision-Wesley)!F.385!Ch.4.
14427 !G.14b.|'{A20}{H10L12M29}|πand there is an intimate
14433 relation between convolutions and _nite Fourier
14439 transforms. If |ε|≤v|4α=↓|4|πexp(2|ε|≤p|βi/K)
14442 |πis a |εK|πth root of unity, the one-dimensional
14450 Fourier transform of |ε(u|β0,|4u|β1,|4.|4.|4.|4,|4u|βK|βα_↓|
14453 β1) |πmay be de_ned to be (|ε|=7u|β0,|4|=7u|β1,|4.|4.|4.|4,|
14459 4|=7u|βK|βα_↓|β1), |πwhere|'{A9}|ε|=7u|βs|4α=↓|4|↔k|uc|)0|¬E
14461 t|¬WK|)|≤v|gs|gtu|βt,!!0|4|¬E|4s|4|¬W|4K.|J!(32)|;
14462 {A9}|πLetting (|ε|=7v|β0,|4|=7v|β1,|4.|4.|4.|4,|4|=7v|βK|βα_
14463 ↓|β1) |πbe de_ned in the same way, as the transform
14473 of (|εv|β0,|4v|β1,|4.|4.|4.|4,|4v|βK|βα_↓|β1),
14475 |πit is not di∃cult to see that (|ε|=7u|β0|=7v|β0,|4|=7u|β1|
14482 =7v|β1,|4.|4.|4.|4, |=7u|βK|βα_↓|β1|=7v|βK|βα_↓|β1)
14484 |πis the transform of (|εw|β0,|4w|β1,|4.|4.|4.|4,|4w|βK|βα_↓
14488 |β1), |πwhere|'{A9}|εw|βr|4|∂α=↓|4u|βrv|β0|4α+↓|4u|βr|βα_↓|β
14490 1v|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|β0v|βr|4α+↓|4u|βK|βα_↓|β1v
14490 |βr|βα+↓|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βr|βα+↓|β1v|βK|βα_↓|
14490 β1|;{A4}|L|4α=↓|4|↔k|uc|)iα+↓j|"or(|πmodulo|4|εK)|)u|βiv|βj.
14491 >{A9}|πWhen |εK|4|¬R|42n |πand |εu|βn|βα+↓|β1|4α=↓|4u|βn|βα+
14495 ↓|β2|4α=↓|4|¬O|4|¬O|4|¬O|4α=↓|4u|βK|βα_↓|β1|4α=↓|4v|βn|βα+↓|
14495 β1|4α=↓|4|¬O|4|¬O|4|¬O|4α=↓|4v|βK|βα_↓|β1|4α=↓|40,
14496 |πthe |εw'|πs are just what we need for multiplication;
14505 |π|εthe transform of a convolution product is
14512 the ordinary product of the transforms. |πThis
14519 idea is a special case of Toom's use of polynomials
14529 {H12}({H10}cf.|4(10){H12}){H10}, with |εx |πreplaced
14533 by roots of unity.|'!|9|4|1|1|1The above property
14540 of Fourier transforms was exploited by V. Strassen
14548 in 1968, using a su∃ciently precise binary representation
14556 of the complex number |ε|≤v, |πto multiply large
14564 numbers faster than was possible under all previously
14572 known schemes. In 1970, he and A. Sch|=4onhage
14580 found an elegant way to modify the method, avoiding
14589 all the complications of complex numbers and
14596 obtaining a very pretty algorithm capable of
14603 multiplying two |εn-|πbit numbers in |εO(n|4|πlog|4|εn|4|πlo
14608 g|4|εn) |πsteps. We shall now study their remarkable
14616 approach [cf. |εComputing |≡7 (1971), 281<292],
14622 |πin a simpli_ed form suggested by V. R. Pratt.|'
14631 !|9|4|1|1|1It is convenient in the _rst place
14638 to replace |εn |πby |ε2|gn, |πand to seek a procedure
14648 that multiplies |ε2|gn-|πbit numbers in |εO(2|gn|4n|4|πlog|4
14653 |εn) |πsteps. Roughly speaking, we shall reduce
14660 the problem of multiplying |ε2|gn-|πbit numbers
14666 to the problem of doing about 2|ε|gn|g/|g2 |πmultiplications
14673 of |ε2|gn|g/|g2-|πbit numbers, with |εO(2|gn|4n)
14679 |πauxiliary steps required to piece these products
14686 together properly; then there will be lg |εn
14694 |πlevels of recursion with |εO(2|gn|4n) |πsteps
14700 per level, making a total of |εO(2|gn|4n|4|πlog|4|εn)
14707 |πsteps as desired.|'!|9|4|1|1|1|πLet |εN|4α=↓|42|gn
14712 |πand suppose we wish to compute the product
14720 of |εu |πand |εv, |πwhere 0|4|¬E|4|εu,|4v|4|¬W|42|gN.
14726 |πAs in Algorithm C we shall break these |εN-|πbit
14735 numbers into groups; let|'{A9}|εk|4α+↓|4l|4α=↓|4n|4α+↓|41,|;
14740 {A4}K|4α=↓|42|gk,!!L|4α=↓|42|gl,|;{A9}|πand write|'
14743 {A9}|εu|4α=↓|4(U|βK|β/|β2|βα_↓|β1|4.|4.|4.|4U|β1U|β0)|m2|βL,
14743 !!v|4α=↓|4(V|βK|β/|β2|βα_↓|β1|4.|4.|4.|4V|β1V|β0)|m2|βL,|;
14744 {A9}|πregarding |εu |πand |εv |πas 2|ε|gk|gα_↓|g1
14750 |πgroups of 2|ε|gl-|πbit numbers. We will select
14757 appropriate values for |εk |πand |εl |πlater;
14764 it turns out (see exercise 10) that we will need
14774 to have|'{A9}|ε4|4|¬E|4k|4|¬E|4l|4α+↓|43,|J!(33)|;
14777 {A9}|πbut no other conditions. The above representation
14784 of |εu |πand |εv |πimplies as before that|'{A9}|εu|4|¬O|4v|4
14792 α=↓|4W|βK|βα_↓|β22|g(|gK|gα_↓|g2|g)|gL|4α+↓|4|¬O|4|¬O|4|¬O|4
14792 α+↓|4W|β12|gL|4α+↓|4W|β0,|J!(34)|;{A9}|πwhere|'
14794 {A9}|εW|βr|4α=↓|4|↔k|uc|)iα+↓jα=↓r|)U|βiV|βj|4α=↓|4|↔k|uc|)i
14794 α+↓j|"or(|πmodulo|4|εK)|)U|βiV|βj,|J!(35)|;{A9}|πif
14796 we de_ne |εU|βi|4α=↓|4V|βj|4α=↓|40 |πfor |εi,
14801 j|4|¬R|4K/2. |πClearly 0|4|¬E|4|εW|βr|4|¬E|4(K/2)(2|gL|4α_↓|
14803 41)|g2|4|¬W|42|g2|gL|gα+↓|gK|gα_↓|g1; |πtherefore
14805 if we knew the |εW'|πs, we could compute |εu|4|¬O|4v
14814 |πby adding up the terms in (34), in |εO{H12}({H10}K(2L|4α+↓
14822 |4k){H12}){H10}|4α=↓|4O(N) |πfurther steps.|'
14825 !|9|4|1|1|1Our goal is to compute the |εW|βr
14832 |πexactly; and we can do this by computing their
14841 value mod |εM, |πwhere |εM |πis any number larger
14850 than (|εK/2)(2|gL|4α_↓|41)|g2. |πThe key idea
14855 is that we can choose |εM|4α=↓|42|g4|gL|4α+↓|41,
14861 |πand compute the |εW|βr |πby doing a ``fast
14869 Fourier transform'' modulo |εM, |πwhere the |εK|πth
14876 root of unity |ε|≤v |πwe use is a power of 2
14887 (so that multiplication by powers of |ε|≤v |πis
14895 very simple).|'!|9|4|1|1|1Before discussing this
14900 idea in detail, a numerical example of what we've
14909 said so far may help to _x the ideas. Suppose
14919 that we want to multiply two 4096-bit numbers,
14927 obtaining an 8192-bit product; thus |εn|4α=↓|412
14933 |πin the above discussion, and we may choose
14941 |εk|4α=↓|48, l|4α=↓|45. |πThe bits of |εu |πand
14948 |εv |πare partitioned into 128 groups of 32 bits
14957 each, and the basic idea is to _nd the 256 convolution
14968 products (35) and to add them together (after
14976 appropriate shifting). The convolution products
14981 have at most 64|4α+↓|47|4α=↓|471 bits each, so
14988 it surely su∃ces to determine them modulo |εM|4α=↓|42|g1|g2|
14995 g8|4α+↓|41. |πWe will see that it is possible
15003 to _nd the convolution products rapidly by _rst
15011 computing their _nite Fourier transform mod |εM,
15018 |πusing the integer |ε|≤v|4α=↓|42 |πas a 256th
15025 ``root of unity.'' These integer calculations
15031 mod |εM |πturn out to have all the necessary
15040 properties of complex roots of unity in the ordinary
15049 Fourier transform (32).|'!|9|4|1|1|1Arithmetic
15053 mod (2|ε|gm|4α+↓|41) |πis somewhat similar to
15059 ones' complement arithmetic, mod (2|ε|gm|4α_↓|41),
15064 |πalthough it is slightly more complicated; we
15071 have already investigated the idea brie⊗y in
15078 Section 3.2.1.1. Numbers can be represented as
15085 |εm-|πbit quantities in binary notation, except
15091 for the special value |→α_↓1|4|"o|42|ε|gm |πwhich
15097 may be represented in some special way. Addition
15105 mod (2|ε|gm|4α+↓|41) |πis easily done in |εα((mn
15112 mod (2|ε|gm|4α+↓|41) |πis easily done in |εO(m)
15119 |πcycles, since a carry o= the left end merely
15128 means we must subtract 1 at the right; similarly,
15137 subtraction mod (2|ε|gm|4α+↓|41) |πis quite simple.
15143 Furthermore, we can multiply by |ε2|gr |πin |εO(m)
15151 |πcycles, when |ε0|4|¬E|4r|4|¬W|4m, |πsince|'
15155 {A9}|ε2|gr|4|¬O|4(u|βm|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2|4|"o|4
15155 |∂(u|βm|βα_↓|βr|βα_↓|β1|4.|4.|4.|4u|β00|4.|4.|4.|40)|β2|;
15156 {A4}|Lα_↓|4(0|4.|4.|4.|4ou|βm|βα_↓|β1|4.|4.|4.|4u|βm|βα_↓|β1
15156 |4.|4.|4.|4u|βm|βα_↓|βr)|β2,!(|πmodulo|42|ε|gm|4α+↓|41).|J!(
15156 36)>|Hβ*?{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/A
folio 388 galley 15
15158 ddision-Wesley!F.388!Ch.4!G.15b.|'{A20}!|9|4|1|1|1Given
15160 a sequence of |εK|4α=↓|42|gk |πintegers (|εa|β0,|4.|4.|4.|4,
15165 |4a|βK|βα_↓|β1), |πand an integer |ε|≤v |πsuch
15171 that |ε|≤v|gK|4|"o|41 (|πmodulo |εM), |πthe integer
15177 Fourier transform|'{A9}|ε|=7a|βs|4α=↓|4|↔a|↔k|uc|)0|¬Et|¬WK|
15179 )|≤v|gs|gta|βt|↔s|πmod|4|εM,!!0|4|¬E|4s|4|¬W|4K|J!(37)|;
15180 {A9}|πcan be calculated rapidly as follows. (In
15187 these formulas the |εs|βj |πand |εt|βj |πare
15194 either 0 or 1, so that each step represents 2|ε|gk
15204 |πcomputations.)|'{A12}!|9|4|1|1|1Step 0.|9|4Let
15207 |εA|g[|g0|g](t|βk|βα_↓|β1,|4.|4.|4.|4,|4t|β0)|4α=↓|4a|βt,!!|
15207 πwhere!!|εt|4α=↓|4(t|βk|βα_↓|β1|4.|4.|4.|4t|β0)|β2.|'
15208 {A12}|π!|9|4|1|1|1Step 1.|9|4Set |εA|g[|g1|g](s|βk|βα_↓|β1,|
15210 4t|βk|βα_↓|β2,|4.|4.|4.|4,|4t|β0)|'{A9}|εα=↓|4{H12}({H10}A|g
15211 [|g0|g](0,|4t|βk|βα_↓|β2,|4.|4.|4.|4,|4t|β0)|4α+↓|4|≤v|ur(s|
15211 βk|βα_↓|β10.0|4.|4.|40)|β2|)|)|4|¬O|4A|g[|g0|g](1,|4t|βk|βα_
15211 ↓|β2,|4.|4.|4.|4,|4t|β0){H12}){H10}|πmod|4|εM.|;
15212 {A9}|π!|9|4|1|1|1Step 2.|9|4Set |εA|g[|g2|g](s|βk|βα_↓|β1,|4
15214 s|βk|βα_↓|β2,|4t|βk|βα_↓|β3,|4.|4.|4.|4,|4t|β0)|'
15215 {A9}|εα=↓|4{H12}({H10}A|g[|g1|g](s|βk|βα_↓|β1,|40,|4t|βk|βα_
15215 ↓|β3,|4.|4.|4.|4,|4t|β0)|4α+↓|4|≤v|ur(s|βk|βα_↓|β2s|βk|βα_↓|
15215 β10|4.|4.|4.|40)|β2|)|)|4|¬O|4A|g[|g1|g](s|βk|βα_↓|β1,|41,|4
15215 t|βk|βα_↓|β3,|4.|4.|4.|4,|4t|β0){H12}){H10}|πmod|4|εM.|;
15216 {A9}|π!|9|4|1|1|1Step |εk.|9|4|πSet |εA|g[|gk|g](s|βk|βα_↓|β
15218 1,|4.|4.|4.|4,|4s|β1,|4s|β0)|'{A9}|εα=↓|4{H12}({H10}A|g[|gk|
15219 gα_↓|g1|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|β1,|40)|4α+↓|4|≤v|ur
15219 (s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2|)|)|4|¬O|4A|g[|gk|gα_↓|
15219 g1|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|β1,|41){H12}){H10}|πmod|4
15219 |εM.|;{A9}|πIt is not di∃cult to prove by induction
15228 that we have|'|εA|g[|gj|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|βk|β
15231 α_↓|βj, t|βk|βα_↓|βj|βα_↓|β1,|4.|4.|4.|4,|4t|β0)|'
15233 {A9}|εα=↓|↔k|uc|)0|¬Et|βk|βα_↓|β1,|4.|4.|4.|4,|4t|βk|βα_↓|βj
15233 |¬E1|)|≤v|ur(s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2|4|¬O|4(t|βk
15233 |βα_↓|β1|4.|4.|4.|4t|βk|βα_↓|βj0|4.|4.|4.|40)|β2|)|)|4a|βt|4
15233 |πmod|4|εM,|;{A6}|πso that|'{A9}|εA|g[|gk|g](s|βk|βα_↓|β1,|4
15236 .|4.|4.|4,|4s|β1,|4s|β0)|4α=↓|4|=7a|β2,!!|πwhere!!|εs|4α=↓|4
15236 (s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2.|;{A9}|π(Note
15238 the reversed order of the binary digits in |εs.
15247 |πFor further discussion of transforms such as
15254 this, see Section 4.6.4.)|'{H10L12}|π!|9|4|1|1|1Now
15259 we have enough machinery at our disposal to do
15268 the calculation of all |εW|βr |πas promised.
15275 Let |ε|≤v|4α=↓|42|r2|gl|gα+↓|g3|gα_↓|gk, |πso
15278 that |ε|≤v|gK|4α=↓|42|g8|gL|4|"o|41 (|πmodulo|4|εM),
15281 |πwhere |εM|4α=↓|42|g4|gL|4α+↓|41. |πThe integer
15285 fast Fourier transform algorithm above can be
15292 applied to (|εU|ε|β0,|4.|4.|4.|4,|4U|βK|βα_↓|β1)
15295 |πto obtain (|ε|=#U|β0,|4.|4.|4.|4,|4|=#U|βK|βα_↓|β1);
15298 |πeach of the |εk |πsteps involves 2|ε|gk |πcomputations
15306 of the form |εc|4α=↓|4(a|4α+↓|42|geb) |πmod |εM,
15312 |πso the running time is |εO(k2|gkL)|4α=↓|4O(kN).
15318 |πSimilarly we obtain (|ε|=#V|β0,|4.|4.|4.|4,|4|=#V|βK|βα_↓|
15321 β1) |πin |εO(kN) |πsteps. The next step is to
15330 compute|'{A9}|ε(a|β0,|4a|β1,|4.|4.|4.|4,|4a|βK|βα_↓|β1)|4α=↓
15331 |4(U|β0V|β0,|4U|β1V|β1,|4.|4.|4.|4,|4U|βK|βα_↓|β1V|βK|βα_↓|β
15331 1)|πmod|4|εM,|;{A9}|πusing a high-speed multiplication
15336 procedure for each of these products, obtaining
15343 the results mod |εM |πby subtracting the most
15351 signi_cant halves from the least signi_cant halves.
15358 If we now use the fast Fourier transform a third
15368 time, obtaining (|ε|=7a|β0,|4|=7a|β1,|4.|4.|4.|4,|4|=7a|βK|β
15370 α_↓|β1), |πthis is enough to determine |ε(W|β0,|4W|β1,|4.|4.
15376 |4.|4,|4W|βK|βα_↓|β1) |πwithout much more work,
15381 since we shall prove that|'{A9}|ε2|gkW|βr|4|"o|4|=7a|βr!!(|π
15386 modulo|4|εM).|J!(38)|;{A9}|πThis congruence means
15390 that an appropriate shifting operation, namely
15396 to multiply |ε|→α_↓|=7a|βr |πby 2|ε|g4|gL|gα_↓|gk
15401 |πmod |εM |πas in (36), _nally yields |εW|βr.|'
15409 |π!|9|4|1|1|1All this may seem like magic, but
15416 it works; a careful study of the above remarks
15425 will show that the method is very clever but
15434 not a complete mystery. The proof of (38) relies
15443 primarily on the fact that |ε|≤v|gK|g/|g2|4|"o|4|→α_↓1
15449 (|πmodulo|4|εM), |πbecause this fact can be used
15456 to prove that|'{A9}|ε|↔k|uc|)0|¬Et|¬WK|)|≤v|gs|gt|4α=↓|4|↔A|
15459 (K,!!|πif!!|εs|4|πmod|4|εK|4α=↓|40;|d50,|1|1|1,!!|πif!!|εs|4
15459 |πmod|4|εK|4|=|↔6α=↓|40.|)|J!(39)|;{A9}|πFor
15461 when |εs |πmod |εK|4|=|↔6α=↓|40, |πlet |εs |πmod
15468 |εK|4α=↓|42|gpq |πwhere |εq |πis odd and |ε0|4|¬E|4p|4|¬W|4k
15474 . |πSetting |εT|4α=↓|42|gk|gα_↓|g1|gα_↓|gp, |πwe
15478 have |ε|≤v|gs|gT|4|"o|4|≤v|gq|gK|g/|g2|4|"o|4|→α_↓1,
15480 |πhence |ε|≤v|g2|gs|gT|4|"o|4|→α+↓1 |πand|'{A9}|ε|h|εa|βs|4|
15483 ∂|"o|4|β0|β|¬E|βi|β,|βj|β|¬Q|βKU|βiV|βj|β0|β|¬E|βt|β|¬W|βK|≤
15483 v|g(|gs|gα+↓|gi|gα+↓|gj|g)|gt|4|"o|4K|βi|βα+↓|βj|βα+↓|βs|β|"
15483 o|β0|1|1|π|β(|βm|βo|βd|βu|βl|βo|1|1|βK|β)|εU|βiV|βj.|E|n|;
15484 | |=7a|βs|4|L|"o|4|↔k|uc|)0|¬Et|¬WK|)|≤v|gs|gt|=#U|βt|=#V|βt
15484 |4|"o|4|↔k|uc|)0|¬Et,i,j|¬WK|)|≤v|gs|gt|≤v|gt|giU|βi|≤v|gt|g
15484 iV|βj>{A4}|L|4|"o|4|↔k|uc|)0|¬Ei,j|¬WK|)U|βiV|βj|↔k|uc|)0|¬E
15485 t|¬WK|)|≤v|ur(sα+↓iα+↓j)t|)|)|4|"o|4K|↔k|uc|)0|¬Ei,j|¬WK|diα
15485 +↓jα+↓s|"o0|4(|πmodulo|4|εK)|)U|βiV|βj.>{A9}|π{H10L12}!|9|4|
15486 1|1|1The multiplication procedure is nearly complete;
15492 it remains for us to specify |εk |πand |εl, |πand
15502 to total up the amount of work involved. Let
15511 |εM(n) |πdenote the time it takes to multiply
15519 |ε2|gn-|πbit numbers by the above method, and
15526 let |εM|¬S(n)|4α=↓|4M(n)/2|gn. |πThe calculation
15530 time involves |εO(kN) |πsteps for the three Fourier
15538 transforms and the other operations of negligible
15545 cost, plus 2|ε|gk |πmultiplications of integers
15551 in the interval [0,|42|g4|ε|gL], |πhence we have|'
15558 {A9}|εM(n)|4α=↓|42|gkM(l|4α+↓|42)|4α+↓|4O(kN);!!M|¬S(n)|4α=↓
15558 |42M|¬S(l|4α+↓|42)|4α+↓|4O(k).|;{A9}|πWe get
15561 the best reduction of |εM|¬S(n) |πwhen |εl |πis
15569 chosen to be as low as possible, consistent with
15578 (33), so we set|'{A9}|εk|4α=↓|4|"ln/2|"L|4α+↓|42,!!l|4α=↓|4|
15582 "pn/2|"P|4α_↓|41.|J!(40)|;{A9}|π!|9|4|1|1|1We
15584 have proved that there is a constant |εC |πsuch
15593 that|'{A9}|εM|¬S(n)|4|¬E|42M|¬S(|"p(n|4α_↓|42)/2|"P|4α+↓|42)
15594 |4α+↓|4Cn,!!|πfor|4all!!|εn|4|¬R|44.|;{A9}|πIterating
15596 this relation (cf. exercise 1.2.4<35) yields|'
15602 {A9}|εM|¬S(n)|4|¬E|42|gjM|¬S(|"p(n|4α_↓|42)/2|gj|"P|4α+↓|42)
15602 |4α+↓|4C(2|gj|gα_↓|g1j|"p(n|4α_↓|42)/2|gj|gα_↓|g1|"P|4α+↓|42
15602 |gj|gα+↓|g1|4α_↓|42),|;{A9}|πfor |εj|4α=↓|41,|42,|4.|4.|4.|4
15604 ,|4|"p|πlg|4(|εn|4α_↓|42)|"P; |πand |εj|4α=↓|4|"p|πlg|4(|εn|
15606 4α_↓|42)|"P |πyields |εM|¬S(n)|4α=↓|4O(n|4|πlog|4|εn).
15609 |πWe have proved the main result of this section:|'
15618 {A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡S (A. Sch|=4onhage,
15622 V. Strassen).|9|4|εIt is possible to multiply
15628 two n-bit numbers in O(n|4|πlog|4|εn|4|πlog|4log|4|εn)
15633 steps.|'{A12}|π!|9|4|1|1|1Our formulation of
15637 the multiplication procedure was designed primarily
15643 for simplicity of exposition, it does not turn
15651 out to be an especially fast method for small
15660 |εn; |πfor example, a lot of the bits which the
15670 above method deals with are known to be zero.
15679 Thus the algorithm needs to be re_ned somewhat
15687 if it is ever to become competitive with Algorithm
15696 C when |εn |πis in a practical range. As |εn|4|¬M|4|¬X,
15706 |πof course, fast Fourier multiplication becomes
15712 vastly superior to Algorithm C. John Pollard
15719 has presented a fast Fourier multiplication algorithm
15726 which is useful for moderately large |εn, |πin
15734 |εMath. Comp. |≡2|≡5 (1971), 365<374.|'!|9|4|1|1|1|πThe
15740 word ``steps'' in Theorem S has been used somewhat
15749 loosely; we have implicitly been assuming a ``conventional
15757 computer'' with unlimited random-access memory,
15762 which takes one unit of time to read and write
15772 any bit. This assumption is quite unrealistic
15779 as |εn|4|¬M|4|¬X, |πsince we need |εO(|πlog|4|εn)
15785 |πbits in an instruction or an index register
15793 just to be able to distinguish between |εn |πmemory
15802 cellqs,*?h takes one unit of time to read and
15811 write any bit. This assumption is quite unrealistic
15819 as |εn|4|¬M|4|¬X, |πsince we need |εO(|πlog|4|εn)
15825 |πbits in an instruction or an index register
15833 just to be able to distinguish between |εn |πmemory
15842 cells, so the actual time to access memory on
15851 a ``conventional computer'' is really proportional
15857 to log |εn. |πWe often gorget this dependence
15865 on |εn |πbecause it does not occur on real machines
15875 with bounded memory and bounded register size.
15882 When |εn |πbecomes really large the only physically
15890 appropriate model seems to be a _nite memory
15898 with a _nite number of arbitrarily long tapes;
15906 the fast Fourier {U0}{H9L11M29}|πW58320#Computer
folio 392 galley 16
15910 programming!(Knuth/Addision-Wesley)!F.392!Ch.4!G.16b.|'
15911 {A20}{H10L12M29}!|9|4|1|1|1The di=erence between
15914 these computer models can be clari_ed by considering
15922 another method due to Sch|=4onhage and Strassen:
15929 If |εn|4α=↓|42|gm|4|¬O|4m, |πso that |εm|4|¬V|4|πlg|4|εn
15934 |πand |ε2|gm|4|¬V|4n/|πlg|4|εn, |πit is possible
15939 to use the fast Fourier transform over the complex
15948 numbers to compute the product of two |εn-|πbit
15956 numbers by doing |εO(m|4|¬O|42|gm) |πmultiplications
15961 of |ε6m-|πbit numbers. Each of the latter can
15969 be broken into 12|g2|4α=↓|4144 multiplications
15974 of (|f1|d32|)|εm)-|πbit numbers. Now we can construct
15981 a multiplication table containing all products
15987 |εxy |πwith |ε0|4|¬E|4x,|4y|4|¬W|42|g(|g1|g/|g2|g)|gm,
15990 |πby repeated addition, in |εO(m|4|¬O|42|g(|g1|g/|g2|g)|gm|4
15994 |¬O|42|g(|g1|g/|g2|g)|gm) |πsteps; then each
15998 of the |εO(m|4|¬O|42|gm) |πneeded products can
16004 be done by table lookup in |εO(m) |πsteps. The
16013 total number of steps for |εthis |πprocedure
16020 therefore comes to |εO(m|g22|gm)|4α=↓|4O(n|4|πlog|4|εn);
16024 |πwe have gotten rid of the factor log log |εn
16034 |πin Theorem S, but the method really |εrequires
16042 |πan unbounded random-access memory since the
16048 table lookup cannot be done e∃ciently with a
16056 _nite number of tapes. (Of course, a factor of
16065 log log |εn |πis utterly negligible in practice;
16073 when |εn |πchanges from 10|g9 |πto 10|g1|g8,
16080 lg|4lg|4|εn |πincreases by only one.)|'!|9|4|1|1|1Perhaps
16086 |εO(n|4|πlog|4|εn|4|πlog|4|εn) |πwill turn out
16090 to be the fastest achievalbe multiplication speed,
16097 on the tape model, and |εO(n|4|πlog|4|εn) |πon
16104 the unlimited random-access model; no such result
16111 has yet been proved. The best lower bound known
16120 to date is a rather deep theorem proved by Michael
16130 S. Paterson, Michael J. Fischer, and Albert R.
16138 Meyer [|εSIAM-AMS Proceedings |≡7 (1974), 97<111],
16144 |πbased on techniques originally introduced by
16150 S. A. Cook and S. Aanderaa, that under certain
16159 restrictions there is no algorithm which multiplies
16166 |εn-|πbit numbers with an average of less than|'
16174 {A9}|εO(n|4|πlog|4|εn/|πlog|4log|4|εn)|J!(41)|;
16175 {A9}|πoperations. The restrictions under which
16180 (41) is a lower bound are rather severe: (a)
16189 The |ε(k|4α+↓|41)|πst input symbols of the operands,
16196 from right to left, must not be read by the algorithm
16207 until after the |εk|πth output symbol has been
16215 produced; and (b) the internal tables kept by
16223 the algorithm must have a ``uniform'' structure↔,*?*?*?*?``unifo
16229 rm'' structure, in an appropriate sense. The
16236 latter restriction rules out algorithms which
16242 use general List structures for their internal
16249 tables, and the _rst restriction rules out both
16257 Algorithm C and Algorithm S. It is still conceivable
16266 (though unlikely) that an algorithm which violates
16273 (a) or (b) could multiply |εn-|πbit numbers in
16281 |εO(n) |πcycles. M. J. Fischer and L. J. Stockmeyer
16290 have shown [|εJ. Computer and System Sciences
16297 |≡9 (1974), 317<331] |πthat multiplication under
16303 restrictions (a) and (b) is possible in |εO(n(|πlog|4|εn)|g2
16310 |4|πlog|4log|4|εn) |πsteps.|'{A12}|≡D|≡. |≡D|≡i|≡v|≡i|≡s|≡i|
16313 ≡o|≡n|≡.|9|4Using a fast multiplication routine,
16318 we can now show that division can also be accomplished
16328 in |εO(n|4|πlog|4|εn|4|πlog|4log|4|εn) |πcycles,
16331 for some constant |ε|≤a.|'|π!|9|4|1|1|1To divide
16337 an |εn-|πbit number |εu |πby an |εn-|πbit number
16345 |εv, |πwe may _rst _nd an |εn-|πbit approximation
16353 to 1/|εv, |πthen multiply by |εu |πto get an
16362 approximation |ε|=7q |πto |εu/v, |πand, _nally,
16368 we can make the slight correction necessary to
16376 |ε|=7q |πto ensure that |ε0|4|¬E|4u|4α_↓|4qv|4|¬W|4v
16381 |πby using another multiplication. From this
16387 reasoning, we see that it su∃ces to have an algorithm
16397 which approximates the reciprocal of an |εn-|πbit
16404 number, in |εO(n|4|πlog|4|εn|4|πlog|4log|4|εn)
16407 |πcycles. The following algorithm achieves this,
16413 using ``Newton's method'' as explained at the
16420 end of Section 4.3.1:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m
16425 |≡R (|εHigh-precision reciprocal).|9|4|πLet |εv
16429 |πhave the binary representation |εv|4α=↓|4(0.v|β1v|β2v|β3|4
16433 .|4.|4.)|β2, |πwhere |εv|β1|4α=↓|41. |πThis algorithm
16438 computes an approximation |εz |πto 1/|εv, |πwhich
16445 satis_es|'{A9}|ε|¬Gz|4α_↓|41/v|¬G|4|¬E|42|gα_↓|gn.|;
16447 {A9}|π{I1.9H}|≡R|≡1|≡.|9[Initial approximation.]
16449 Set |εz|4|¬L|4|f1|d34|)|"l32/(4v|β1|4α+↓|42v|β2|4α+↓|4v|β3)|
16450 "L, k|4|¬L|40.|'{A3}|π|≡R|≡2|≡.|9[Newtonian iteration.]
16454 (At this point we have a number |εz |πof the
16464 binary form |εxx.xx|4.|4.|4.|4x |πwith |ε2|gk|4α+↓|41
16469 |πplaces after the radix point, and |εz|4|¬E|42.)
16476 |πCalculate |εz|g2|4α=↓|4xxx.xx|4.|4.|4.|4x |πexactly,
16479 using a high-speed multiplication routine. Then
16485 calculate |εV|βkz|g2 |πexactly, where |εV|βk|4α=↓|4(0.v|β1v|
16489 β2|4.|4.|4.|4v|β2|lk|lα+↓|l1|βα+↓|β3)|β2. |πThen
16491 set |εz|4|¬L|42z|4α_↓|4V|βkz|g2|4α+↓|4r, |πwhere
16494 0|4|¬E|4|εr|4|¬W|42|gα_↓|g2|ik|iα+↓|i1|gα_↓|g1
16495 |πis added if necessary to ``round up'' |εz |πso
16504 it is a multiple of 2|gα_↓|g2|ε|ik|iα+↓|i1|gα_↓|g1.
16510 |πFinally, set |εk|4|¬L|4k|4α+↓|41.|'{A3}|π|≡R|≡3|≡.|9[Test
16514 for end.] If |ε2|gk|4|¬W|4n, |πgo back to step
16522 R2; otherwise the algorithm terminates.|'{IC}{A12}!|9|4|1|1|
16527 1This algorithm is a modi_cation of a method
16535 suggested by S. A. Cook. A similar technique
16543 has been used in computer hardware [see Anderson,
16551 Earle, Goldschmidt, and Powers, |εIBM J. Res.
16558 Dev. |≡1|≡1 (1967), 48<52]. |πOf course, it is
16566 necessary to check the accuracy of Algorithm
16573 R quite carefully, because it comes very close
16581 to being inaccurate. We will prove by induction
16589 that|'{A9}|εz|4|¬E|42!!|πand!!|ε|¬Gz|4α_↓|41/v|¬G|4|¬E|42|gα
16590 _↓|g2|ik|J!(42)|;{A9}|πat the beginning and end
16596 of step R2.|'!|9|4|1|1|1For this purpose, let
16603 |ε|≤d|βk|4α=↓|41/v|4α_↓|4z|βk, |πwhere |εz|βk
16606 |πis the value of |εz |πafxte*?*? |εz|βk |πis the
16615 value of |εz |πafter |εk |πiterations of step
16623 R2. To start the induction on |εk, |πwe have
16632 |ε|≤d|β0|4α=↓|41/v|4α_↓|48/v|¬S|4α+↓|4(32/v|¬S|4α_↓|4|"l32/v
16632 |¬S|"L)4|4α=↓|4|≤h|β1|4α+↓|4|≤h|β2, |πwhere |εv|¬S|4α=↓|4(v|
16634 β1v|β2v|β3)|β2, |≤h|β1|4α=↓|4(v|¬S|4α_↓|48v)/vv|¬S
16636 |πsatis_es |→α_↓|f1|d32|)|4|¬W|4|ε|≤h|β1|4|¬E|40,
16638 |πand 0|4|¬E|4|ε|≤h|β2|4|¬W|4|f1|d34|). |πHence
16641 |ε|¬G|≤d|β0|¬G|4|¬W|4|f1|d32|). |πNow suppose
16644 (42) has been veri_ed for |εk; |πthen|'{A9}|ε|h|ε|≤d|βk|βα+↓
16651 |β1|4α=↓|41/v|4α_↓|4z|βk|βα+↓|β1|4|∂α=↓|4|≤d|βk|4α_↓|4(1/v|4
16651 α_↓|4|≤d|βk)v|≤d|βk|4α_↓|4z|βk(v|4α_↓|4V|βk)|4α_↓|4r|E|n|;
16652 | |≤d|βk|βα+↓|β1|4α=↓|41/v|4α_↓|4z|βk|βα+↓|β1|4|Lα=↓|41/v|4α
16652 _↓|4z|βk|4α_↓|4z|βk(1|4α_↓|4z|βkV|βk)|4α_↓|4r>
16653 {A4}|L|4α=↓|4|≤d|βk|4α_↓|4z|βk(1|4α_↓|4z|βkv)|4α_↓|4z|ur2|)k
16653 |)(v|4α_↓|4V|βk)|4α_↓|4r>{A4}|L|4α=↓|4|≤d|βk|4α_↓|4(1/v|4α_↓
16654 |4|≤d|βk)v|≤d|βk|4α_↓|4z|ur2|)k|)(v|4α_↓|4V|βk)|4α_↓|4r>
16655 {A4}|L|4α=↓|4v|≤d|ur2|)k|)|4α_↓|4z|ur2|)k|)(v|4α_↓|4V|βk)|4α
16655 _↓|4r.>{A9}|πNow|'{A8}|ε0|4|¬E|4v|≤d|ur2|)k|)|4|¬W|4|≤d|ur2|
16657 )k|)|4|¬E|4(2|gα_↓|g2|ik)|g2|4α=↓|42|gα_↓|g2|ik|iα+↓|i1,|;
16658 {A6}|πand|'{A6}|ε0|4|¬E|4z|g2(v|4α_↓|4V|βk)|4α+↓|4r|4|¬W|44(
16659 2|gα_↓|g2|ik|iα⊗↓|i1|gα_↓|g3)|4α+↓|42|gα_↓|g2|ik|iα+↓|i1|gα_
16659 ↓|g1|4α=↓|42|gα_↓|g2|ik|iα+↓|i1,|;{A9}|πso |¬G|ε|≤d|βk|βα+↓|
16661 β1|¬G|4|¬E|42|gα_↓|g2|ik|iα+↓|i1. |πWe must still
16665 verify the _rst inequality of (42); to show that
16674 |εz|βk|βα+↓|β1|4|¬E|42, |πthere are three cases:
16679 (a) |εV|βk|4α=↓|4|f1|d32|); |πthen |εz|βk|βα+↓|β1|4α=↓|42.
16683 |π(b) |εV|βk|4|=|↔6α=↓|4|f1|d32|)|4α=↓|4V|βk|βα_↓|β1;
16685 |πthen |εz|βk|4α=↓|42, |πso |ε2z|βk|4α_↓|4z|ur2|)k|)V|βk|4|¬
16688 E|42|4α_↓|42|gα_↓|g2|ik|iα+↓|i1|gα_↓|g1. |π(c)
16690 |εV|βk|βα_↓|β1|4|=|↔6α=↓|4|f1|d32|); |πthen |εz|βk|βα+↓|β1|4
16692 α=↓|41/v|4α_↓|4|≤d|βk|βα+↓|β1|4|¬W|42|4α_↓|42|gα_↓|g2|ik|gα_
16692 ↓|g2|4α↓|42|gα_↓|g2|ik|iα+↓|i1|4|¬E|42, |πsince
16694 |εk|4|¬|40.|'|π!|9|4|1|1|1The running time of
16699 Algorithm R is bounded by|'{A9}|ε2T(4n)|4α+↓|42T(2n)|4α+↓|42
16704 T(n)|4α+↓|42T(|f1|d32|)n)|4|¬O|4|¬O|4es, where
16706 |εT(n) |πis an upper bound on the time needed
16715 to do a multiplication of |εn-|πbit numbers.
16722 When |εT(n)|4α=↓|4C n |πlog |εn |πlog log |εn,
16730 |πwe have |εT(4n)|4α+↓|4T(2n)|4α+↓|4T(n)|4α+↓|4|¬O|4|¬O|4|¬O
16732 |4|¬W|4T(8n), |πso division can be done with
16739 a speed comparable to that of multiplication
16746 except for a constant factor.|'{A12}|≡E|≡. |≡A|≡n
16753 |≡e|≡v|≡e|≡n |≡f|≡a|≡s|≡t|≡e|≡r |≡m|≡u|≡l|≡t|≡i|≡p|≡l|≡i|≡c|
16755 ≡a|≡t|≡i|≡o|≡n |≡m|≡e|≡t|≡h|≡o|≡d|≡.|9|4It is
16758 natural to wonder if multiplication of |εn-|πbit
16765 numbers can actually be accomplished in just
16772 |εn |πsteps; we have come from |εn|g2 |πdown
16780 to |εn|g1|gα+↓|g|≤e, |πso perhaps we can squeeze
16787 the time down even more. This is still an unsolved
16797 problem, as pointed out above, but it is interesting
16806 to note that the best possible time, exactly
16814 |εn |πcycles, |εcan |πbe achieved if we leave
16822 the domain of conventional computer programming
16828 and allow ourselves to build a computer which
16836 has an unlimited number of components all acting
16844 at once.|'!|9|4|1|1|1A |εlinear iterative array
16850 |πof automata is a set of devices |εM|β1, M|β2,
16859 M|β3,|4.|4.|4. |πwhich can each be in a _nite
16867 set of ``states,'' at each step of the computation.
16876 The machines |εM|β2, M|β3,|4.|4.|4. |πall have
16882 |εidentical |πcircuitry, and their state at time
16889 |εt|4α+↓|41 |πis a function of their own state
16897 at time |εt |πas well as the states of their
16907 left and right neighbors at time |εt. |πThe _rst
16916 machine |εM|β1 |πis slightly di=erent: its state
16923 at time |εt|4α+↓|41 |πis a function of its own
16932 state and that of |εM|β2, |πat time |εt, |πand
16941 also of the |εinput |πat time |εt. |πThe |εoutput
16950 |πof a linear iterative array is a function de_ned
16959 on the states of |εM|β1.|'|π!|9|4|1|1|1Let |εu|4α=↓|4(u|βn|β
16965 α_↓|β1|4.|4.|4.|4u|β1u|β0)|β2, v|4α=↓|4(v|βn|βα_↓|β1|4.|4.|4
16966 .|4v|β1v|β0)|β2, |πand |εq|4α=↓|4(q|βn|βα_↓|β1|4.|4.|4.|4q|β
16968 1q|β0)|β2 |πbe binary numbers, and let |εuv|4α+↓|4q|4α=↓|4w|
16974 4α=↓|4(w|β2|βn|βα_↓|β1|4.|4.|4.|4w|β1w|β0)|β2.
16975 |πIt is remarkable fact that a linear iterative
16983 array can be constructed, independent of |εn,
16990 |πwhich will output |εw|β0, w|β1, w|β2,|4.|4.|4.
16996 |πat times 1, 2, 3,|4.|4.|4.|4, if it is given
17005 the inputs |ε(u|β0,|4v|β0,|4q|β0), (u|β1,|4v|β1,|4q|β1),
17009 (u|β2,|4v|β2,|4q|β2)),*?*?*?*?output |εw|β0, w|β1,
17012 w|β2,|4.|4.|4. |πat times 1, 2, 3,|4.|4.|4.|4,
17018 if it is given the inputs |ε(u|β0,|4v|β0,|4q|β0),
17025 (u|β1,|4v|β1,|4q|β1), (u|β2,|4v|β2,|4q|β2),|4.|4.|4.
17027 |πat times 0,|41,|42,|4.|4.|4.|4.|'!|9|4|1|1|1We
17031 can state this phenomenon in the language of
17039 computer hardware, by saying that it is possible
17047 to design a single ``integrated circuit module''
17054 with the following property: If we wire together
17062 su∃ciently many of these devices in a straight
17070 line, with each module communicating only with
17077 its left and right neighbor, the resulting circuitry
17085 will produce the |ε2n-|πbit product of |εn-|πbit
17092 numbers in exactly |ε2n |πclock pulses.|'!|9|4|1|1|1Here
17099 is the basic idea behind this construction: At
17107 time 0, |εM|β1 |πsenses |ε(u|β0,|4v|β0,|4q|β0)
17112 |πand it therefore is able to output (|εu|β0v|β0|4α+↓|4q|β0)
17119 |πmod 2 at time 1. Then it sees (|εu|β1,|4v|β1,|4q|β1)
17129 |πand it can output |ε(u|β0v|β1|4α+↓|4u|β1v|β0|4α+↓|4q|β1|4α
17133 +↓|4k|β1) |πmod 2, where |εk|β1 |πis the ``carry''
17141 left over from the previous step, at time 2.
17150 Next it sees |ε(u|β2,|4v|β2,|4q|β2) |πand outputs
17156 |ε(u|β0v|β2|4α+↓|4u|β1v|β1|4α+↓|4u|β2v|β0|4α+↓|4q|β2|4α+↓|4k
17156 |β2)|πmod 2; furthermore, its state records the
17163 values of |εu|β2 |πand |εv|β2 |πso that machine
17171 |εM|β2 |πwill be able to sense these values at
17180 time 3, and |εM|β2 |πwill be able to compute
17189 |εu|β2v|β2 |πfor the bene_t of |εM|β1 |πat time
17197 4. Thus |εM|β1 |πarranges to start |εM|β2 |πmultiplying
17205 the sequence |ε(u|β2,|4v|β2), (u|β3,|4v|β3),|4.|4.|4.|4,
17209 |πand |εM|β2 |πwill ultimately give |εM|β3 |πthe
17216 job of multiplying (|εu|β4,|4v|β4), (u|β5,|4v|β5),
17221 |πetc. For{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth
folio 394 galley 17
17223 /Addision-Wesley)!f.394!Ch.4!G.17b.|'{A20}{H10L12M29}!|9|4|1
17224 |1|1Each automaton has 2|g1|g1 states (|εc,|4x|β0,|4y|β0,|4x
17229 |β1,|4y|β1,|4x,|4y,|4z|β2,|4z|β1,|4z|β0), |πwhere
17231 |ε0|4|¬E|4c|4|¬W|44 |πand each of the |εx'|πs,
17237 |εy'|πs and |εz'|πs is either 0 or 1. Initially,
17246 all devices are in state (0,|40,|40,|40,|40,|40,|40,|40,|40,
17251 |40). |πSuppose that a machine |εM|βj, j|4|¬Q|41,
17258 |πis in state (|εc,|4x|β0,|4y|β0,|4x|β1,|4y|β1,|4x,|4y,|4z|β
17261 2,|4z|β1,|4z|β0) |πat time |εt, |πand its left
17268 neighbor |εM|βj|βα_↓|β1 |πis in state>{A9}|h|εy|β0|4|∂α=↓|4y
17273 |gl!!|πif!!|εc|gl|4|∂α=↓|45,!!y|β3|4|∂|πotherwise;|E|n|;
17274 |π|L|Lif| |εc|gl|4|Lα=↓|43,| 0|L|πotherwise;>
17275 {A4}|ε| x|ur|↔0|)0|)|4|Lα=↓|4x|gl|π|Lif|ε| c|4|Lα=↓|40,| x|β
17275 0|L|πotherwise;>{A4}|ε| y|ur|↔0|)0|)|4|Lα=↓|4y|gl|L|πif| |εc
17276 |4|Lα=↓|40,| y|β0|L|πotherwise;>{A4}|ε| x|ur|↔0|)1|)|4|Lα=↓|
17277 4x|gl|L|πif| |εc|4|Lα=↓|41,| x|β1|L|πotherwise;|J!(43)>
17278 {A4}|ε| y|ur|↔0|)1|)|4|Lα=↓|4y|gl|L|πif| |εc|4|Lα=↓|41,| y|β
17278 1|L|πotherwise;>{A4}|ε| x|¬S|4|Lα=↓|4x|gl|L|πif| |εc|4|¬R|42
17279 ,| x|L|πotherwise;>{A4}|ε| y|¬S|4|Lα=↓|4y|gl|L|πif| |εc|4|L|
17280 ¬R|42,| |εy|π|Lotherwise;>{A9}|πand (|εz|ur|↔0|)2|)z|ur|↔0|)
17282 1|)z|ur|↔0|)0|))|β2 |πis the binary notation
17287 for|'{A9}|ε|h|εz|β0|4α+↓|4z|β1|4α+↓|4z|β2|4α+↓|4|9|4|∂x|β0y|
17288 gl|4α+↓|4x|β1y|4α+↓|4xy|β1|4α+↓|4x|gly|β0,!!|π|∂if!!|εc|4|∂α
17288 =↓|43;|E|n|;{A24}(44)|E|?| z|urr|)0|)|4α+↓|4z|β1|4α+↓|4z|url
17290 |)2|)|4α+↓|4|E>{B24}|Lx|gly|gl,|L|πif| |εc|4|Lα=↓|40;>
17292 {A4}|Lx|β0y|gl|4α+↓|4x|gly|β0,|L|πif| |εc|4|Lα=↓|41;>
17293 {A4}|Lx|β0y|gl|4α+↓|4x|β1y|β1|4α+↓|4x|gly|β0,|L|πif| |εc|4|L
17293 α=↓|42;>{A4}|Lx|β0y|gl|4α+↓|4x|β1y|4α+↓|4xy|β1|4α+↓|4x|gly|β
17294 0,|L|πif| |εc|4|Lα=↓|43.>{A9}|π{H10L12M29}The
17296 leftmost machine |εM|β1 |πbehaves in almost the
17303 same way as the others; it acts exactly as if
17313 there were a machine to its left in state (3,|40,|40,|40,|40
17322 ,|4u,|4v,|4q,|40,|40) |πwhen it is receiving
17327 the inputs (|εu,|4v,|4q). |πThe output of the
17334 array is the |εz|β0 |πcomponent of |εM|β1.|'|π!|9|4|1|1|1Tab
17341 le 1 shows an example of this array acting on
17351 the inputs |εu|4α=↓|4v|4α=↓|4(.|4.|4.|400010111)|β2,
17354 q|4α=↓|4(.|4.|4.|400001011)|β2. |πThe output
17357 sequence appears in the lower right portion of
17365 the states of |εM|β1: 0,|40,|41,|41,|41,|40,|40,|40,|40,|41,
17369 |40,|4.|4.|4.|4, |πrepresenting the number (.|4.|4.|40100001
17373 1100)|β2 from right to left.|'!|9|4|1|1|1This
17379 construction is based on a similar one _rst published
17388 by A. J. Atrubin, |εIEEE Transactions |π|≡E|≡C|≡<|≡1|≡4
17395 (1965), 394<399. |πS. Winograd [|εJACM |≡1|≡4
17401 (1967), |π793<802] has investigated the minimum
17407 multiplication time achievable in a logical circuit
17414 when |εn |πis given and when the inputs are available
17424 all at once in coded form; see also C. S. Wallace,
17435 |εIEEE Trans. |π|≡E|≡C|≡<|≡1|≡3 (1964), 14<17.|'
17440 !|9|4|1|1|1R. P. Brent has shown that functions
17447 such as log |εx, |πexp |εx, |πand arctan |εx
17456 |πcan be evaluated to |εn |πsigni_cant bits in
17464 |εO(n(|πlog|4|εn)|g2|4|πlog|4log|4|εn) |πsteps,
17466 using high-speed multiplication [|εJACM, |πto
17471 appear].|'{A24}|∨E|∨X|∨E|∨R|∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|
17473 9|1|≡1|≡.|9|4[|ε|*/|↔P|↔P|\] |πThe idea expressed
17477 in (2) can be generalized to the decimal system,
17486 if the radix 2 is replaced by 10. Using this
17496 generalization, calculate 2718 times 4742 (reducing
17502 this product of four-digit numbers to three products
17510 of two-digit numbers, and reducing each of the
17518 latter to products of one-digit numbers).|'{A3}|9|1|≡2|≡.|9|
17524 4[|εM|*/|↔P|↔P|\] |πProve that, in step Cl of
17531 Algorithm C, the value of |εR |πeither stays
17539 the same or increases by one when we set |εR|4|¬L|4|"l{H11}|
17548 ¬H{H9}|v4Q|)|"L. (|πTherefore, as observed in
17553 that step, we need not calculate a square root.)|'
17562 {A3}|9|1|≡3|≡.|9|4[|εM|*/|↔P|↔L|\] |πProve that
17565 the sequences |εq|βk, r|βk |πde_ned in Algorithm
17572 C satisfy the inequality |ε2|gq|rk|gα+↓|g1(2r|βk)|gr|rk|4|¬E
17576 |42|gq|rk|rα_↓|r1|gα+↓|gq|rk, |πwhen |εk|4|¬Q|40.|'
17579 {A3}|π|9|1|≡4|≡.|9|4[|εM|*/|↔P|↔l|\] |π(K. Baker.)
17582 Show that it is advantageous to evaluate the
17590 polynomial |εW(x) |πat the points |εx|4α=↓|4|→α_↓r,|4.|4.|4.
17595 |4,|40,|4.|4.|4.|4,|4r |πinstead of at the points
17601 |εx|4α=↓|40,|41,|4.|4.|4.|4,|42r |πas in Algorithm
17605 C. The polynomial |εU(x) |πcan be written |εU(x)|4α=↓|4U|βe(
17612 x|g2)|4α+↓|4xU|βo(x|g2), |πand similarly |εV(x)
17616 |πand |εW(x) |πcan be expanded in this way; show
17625 how to exploit this idea, obtaining faster calculations
17633 in steps C7 and C8.|'{A3}{H9L11M29}|9|1|≡5|≡.|9|4[|εHM|*/|↔L|
17638 ↔C|\] |πShow that if in step C1 Algorithm C we
17648 set |εR|4|¬L|4|"p{H11}|¬H{H9}|v42Q|)|"P|4α+↓|41
17650 |πinstead of |εR|4|¬L|4|"l{H11}|¬H{H9}|v4Q|)|"L,
17653 |πwith suitable initial values of |εq|β0,|4q|β1,|4r|β0,
17659 |πand |εr|β1, |πthen (19) can be improved to
17667 |εt|βk|4|¬E|4q|βk|βα+↓|β12|ur|¬H2|4|πlog|β2|4|εq|βk|βα+↓|β1|
17667 )|)(|πlog|β2|4|εq|βk|βα+↓|β1).|'{A3}|π|9|1|≡6|≡.|9|4[|εM|*/|↔
17668 P|↔L|\] |πProve that the six numbers in (22)
17676 are relatively prime in pairs.|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P|
17681 ↔L|\] |πProve (23).|'{A3}|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔p|\]
17685 |πWhy does the fast Fourier multiplication algorithm
17692 bother to work mod(2|ε|gN|4α+↓|41) |πinstead
17697 of mod(2|ε|gN|4α_↓|41)? |πIt would seem to be
17704 much simpler to do everything mod(2|ε|gN|4α_↓|41),
17710 |πavoiding a lot of miscellaneous minus signs
17717 in the formulas, since |ε|≤v|4α=↓|42 |πcan be
17724 used to compute fast Fourier transforms mod(2|ε|g2|in|4α_↓|4
17730 1). |πWhat would go wrong?|'{A3}|≡1|≡0|≡.|9|4[|εM|*/|↔P|↔O|\]
17735 |πWhere is condition (33) used?|'{A3}|≡1|≡1|≡.|9|4[|εM|*/|↔P
17741 |↔o|\] |πIf |εn |πis _xed, how many of the automata
17751 in the linear iterative array (43), (44) are
17759 needed to compute the product of |εn-|πbit numbers?
17767 (Note that the automaton |εM|βj |πis only in⊗uenced
17775 by the component |εz|urr|)0|) |πof the machine
17782 on its right, so we may remove all automata whose
17792 |εz|β0 |πcomponent is always zero whenever the
17799 inputs are |εn-|πbit numbers.)|'{A3}|≡1|≡2|≡.|9|4[|εM|*/|↔C|↔
17803 c|\] |πImprove on the lower bound (41); is it
17812 impossible for a general node-structure automation
17818 (as described in Section 2.6) to multoply |εn-|πbit
17826 numbers in |εO(n) |πcycles?|'{A3}|≡1|≡3|≡.|9|4[|εM|*/|↔P|↔C|\
17830 ] |π(A. Sch|=4onhage.) What is a good upper bound
17839 on the time needed to multiply an |εm-|πbit number
17848 by an |εn-|πbit number, when both |εm |πand |εn
17857 |πare very large but |εn |πis much larger than
17866 |εm, |πbased on the results proved in this section
17875 for |εm|4α=↓|4n?|'{A3}|π|≡1|≡4|≡.|9|4[|εM|*/|↔M|↔P|\]
17878 |πWrite a program for Algorithm C, incorporating
17885 the improvements of exercise 4. Compare it with
17893 a program for Algorithm 4.3.1M and with a program
17902 based on (2), to see how large |εn |πmust be
17912 before Algorithm C is an improvement.|'{A9}|9|1|≡9|≡.|9|4[|ε
17918 M|*/|↔P|↔c|\] |πWhat is |ε|=7u|βr (|πthe result
17924 of two successive Fourier transforms {H11}({H9}32){H11}){H9}
17929 ?|'{A24}{H10L12M29}|∨4|∨.|∨4|∨. |∨R|∨A|∨D|∨I|∨X
17932 |∨C|∨O|∨N|∨V|∨E|∨R|∨S|∨I|∨O|∨N|'{A12}If men had
17936 invented arithmetic by counting with their two
17943 _sts or their eight _ngers, instead of their
17951 ten ``digits,'' we woworry
17959 about writing binary-decimal conversion routines.
17964 (And we would perhaps never have learned as much
17973 about number systems.) In this section, we shall
17981 discuss the conversion of{U0}{H9L11M29}|πW58320#Computer
folio 395 galley 18
17985 Programming!(Knuth/Addision-Wesley)!f.395!Ch.4!G.18b.|'
17986 {A20}{H8L10M29}|∨T|∨a|∨b|∨l|∨e |∨1|;{A3}{H9L11M29}|πMULTIPLI
17988 CATION IN A LINEAR ITERATIVE ARRAY|;{A15}{H9L11M13.6}|∂!!!!!
17994 !!|9|∂!!|9|∂!!!!!!!|9|∂|E|'|π|>Module|4|1|εM|β2|;
17997 |;|πModule|4|1|εM|β3|;>{A11}|∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂!!|9|
18000 ∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂|E|'{H9L5.5M13.6}|ε|>
18002 |;|;|;|;z|β2|;|;|;|;|;|;z|β2|;>|>|;x|β0|;x|β1|;
18018 x|;|;|;|;x|β0|;x|β1|;x|;>|>c|;|;|;|;z|β1|;|;c|;
18034 |;|;|;z|β1|;>|>|;y|β0|;y|β1|;y|;|;|;|;y|β0|;y|β1|;
18049 y|;>|>|;|;|;|;z|β0|;|;|;|;|;|;z|β0|;>{A9}|>|;
18066 |;|;|;0|;|;|;|;|;|;0|;>|>|;0|;0|;0|;|;|;|;0|;
18086 0|;0|;>|>0|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;0|;0|;
18106 0|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;|;|;|;0|;
18126 >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;0|;0|;0|;
18146 |;|;|;0|;0|;0|;>|>0|;|;|;|;0|;|;0|;|;|;|;0|;>
18166 |>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;
18186 |;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;
18207 0|;0|;0|;|;|;|;0|;0|;0|;>|>0|;|;|;|;0|;|;0|;|;
18226 |;|;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>|;|;|;
18246 |;0|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;
18266 |;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>0|;|;|;|;
18286 0|;|;0|;|;|;|;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;
18305 >|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;
18326 |;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;0|;0|;0|;>|>
18346 1|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;
18366 0|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;
18386 |;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;0|;
18406 0|;0|;>|>2|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;1|;0|;
18426 0|;|;|;|;0|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;
18446 >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;1|;
18466 |;|;|;0|;0|;0|;>|>3|;|;|;|;1|;|;0|;|;|;|;0|;>
18486 |>|;1|;0|;1|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;
18506 |;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;
18527 1|;0|;0|;|;|;|;1|;0|;0|;>|>3|;|;|;|;1|;|;1|;|;
18546 |;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>|;|;|;
18566 |;0|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;0|;|;|;|;|;
18586 |;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>3|;|;|;|;
18606 1|;|;2|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;
18625 >|>|;|;|;|;0|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;
18646 |;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>
18666 3|;|;|;|;0|;|;3|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;
18686 1|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;
18706 |;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;
18726 0|;0|;>|>3|;|;|;|;0|;|;3|;|;|;|;0|;>|>|;1|;0|;
18746 0|;|;|;|;1|;0|;0|;>|>|;|;|;|;0|;|;|;|;|;|;0|;
18766 >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;
18786 |;|;|;1|;0|;0|;>|>3|;|;|;|;0|;|;3|;|;|;|;0|;>
18806 |>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>|;|;|;|;0|;|;|;
18826 |;|;|;0|;>>{A12}{H9L11M14}|∂!!!|9|1|1|1|∂!!|∂!!!|∂!!|9|∂!!!!
18832 !!!|9|∂|E|'|π|>Time|;|;Input|;|;Module|4|1|εM|β1|;
18839 >{A11}|∂!!!|9|1|1|1|∂!!|∂!|9|∂!|9|∂!!|9|∂!|9|∂!|9|∂!|9|∂!|9|
18840 ∂!|9|∂|E|'{H9L5.5M14}|ε|>|;|;|;|;|;|;|;|;|;z|β2|;
18852 >|>|;|;|;|;|;|;x|β0|;x|β1|;x|;>|>|;|;|;|;|;c|;
18871 |;|;|;z|β1|;>|>|;|;v|βj|;|;|;|;y|β0|;y|β1|;y|;
18886 >|>|;|;|;|;|;|;|;|;|;z|β0|;>>{A10}|>|;|;|;|;|;
18906 |;|;|;|;0|;>|>|;|;1|;|;|;|;0|;0|;0|;>|>0|;|;1|;
18927 |;0|;|;|;|;0|;>|>|;|;1|;|;|;|;0|;0|;0|;>|>|;|;
18948 |;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;
18969 >|>|;|;1|;|;|;|;1|;0|;0|;>|>1|;|;|;1|;|;1|;|;
18989 |;|;1|;>|>|;|;1|;|;|;|;1|;0|;0|;>|>|;|;|;|;|;
19010 |;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;|;|;1|;>|>|;
19031 |;1|;|;|;|;1|;1|;0|;>|>2|;|;|;0|;|;2|;|;|;|;0|;
19051 >|>|;|;1|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;|;|;|;|;
19073 0|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;
19094 |;1|;1|;1|;>|>3|;|;|;1|;|;3|;|;|;|;1|;>|>|;|;
19114 0|;|;|;|;1|;1|;1|;>|>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>
19136 |;|;|;|;|;|;|;|;|;1|;>|>|;|;1|;|;|;|;1|;1|;0|;
19157 >|>4|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;1|;|;|;|;1|;
19178 1|;0|;>|>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;
19199 |;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;1|;>|>5|;|;
19220 |;0|;|;3|;|;|;|;1|;>|>|;|;0|;|;|;|;1|;1|;1|;>
19240 |>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;|;|;|;
19261 |;|;1|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>6|;|;|;0|;
19281 |;3|;|;|;|;0|;>|>|;|;|;0|;|;3|;|;|;|;0|;>|>|;
19302 |;0|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>
19323 >{A2}|>|;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;
19344 1|;0|;>|>7|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;|;0|;
19364 |;32|;|;|;|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>
19385 |;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;
19406 >|>8|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;
19427 1|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;
19448 |;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>9|;|;
19469 |;0|;|;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>
19489 |>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;
19510 |;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>10|;|;|;0|;
19530 |;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>|;|;
19551 |;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;
19572 >|>|;|;0|;|;|;|;1|;1|;0|;>|>11|;|;|;0|;|;3|;|;
19592 |;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;
19613 |;|;|;|;0|;>|H *?*?*?*?*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!
19620